开发界
首页 移动开发 在线交易 IT笔记 网络技术 操作系统 企业架构 数据库 考试认证 IT培训 开源软件 数据存储 行业资讯
大家都在看:java   开发   C#   oracle   mysql   android   web开发   学习   sqlserver   linux   asp   php   ajax   javascript   css   html
虚拟机中四台Ubuntu安装配置Hadoop(下)
2013-02-22 10:18:28 来源:开发界 作者:admin 栏目:ubuntu 软件 责任编辑:admin
[导读]完成 “虚拟机中四台Ubuntu安装配置Hadoop(上)”中的配置后,就可以进行下面的操作了。

  完成 “虚拟机中四台Ubuntu安装配置Hadoop(上)”中的配置后,就可以进行下面的操作了。

  1. 下载安装 hadoop

  下载当前稳定版本,1.0.4版本

  放到 NameNode01虚拟机的 /home/hadoop目录下

  运行解压命令:

  Java代码

  tar -zxvf hadoop-1.0.4.tar.gz

  tar -zxvf hadoop-1.0.4.tar.gz为了便于操作,将解压后的hadoop-1.0.4 重命名成 hadoop104

  Python代码

  mv hadoop-1.0.4 hadoop104

  mv hadoop-1.0.4 hadoop104

  2. 配置hadoop

  进入 hadoop104目录下,进行下面的配置

  (1) 编辑/home/hadoop/hadoop104/conf 目录下的hadoop-env.sh 文件,

  Python代码

  sudo gedit hadoop-env.sh

  sudo gedit hadoop-env.sh 添加java环境变量

  Java代码

  export JAVA_HOME=/opt/jdk1.6.0_37

  export JAVA_HOME=/opt/jdk1.6.0_37 (2)编辑/home/hadoop/hadoop104/conf 目录下的slaves文件

  Python代码

  sudo gedit slaves

  sudo gedit slaves 该文件用来指定所有的DataNode,一行指定一个主机名。即本文中的DataNode01、NN02、DN02, 所以slaves的内容应该如下:

  Java代码

  DataNode01

  NN02

  DN02

  DataNode01

  NN02

  DN02 (3)修改/home/hadoop/hadoop104/conf 目录下的masters文件,

  打开masters文件,该文件用来指定备份节点Secondarynamenode,生产上环境部署一般不会将namenode和Secondarynamenode同时部署在一台服务器上,内容如下:

  Java代码

  NN02

  NN02(4) 修改/home/hadoop/hadoop104/conf 目录下的core-site.xml文件,core-site.xml是hadoop核心的配置文件,在该文件中配置hdfs的地址和端口,core-site.xml的添加如下内容:

  Xml代码

  <property>

  <name>fs.default.name</name>

  <value>hdfs://hadoop1:9000</value>

  </property>

  <property>

  <name>fs.default.name</name>

  <value>hdfs://hadoop1:9000</value>

  </property>

  (5)将/home/hadoop/hadoop104/src/hdfs/hdfs-default.xml文件复制到/home/hadoop/hadoop104/conf目录下,并改名且覆盖原hdfs-site.xml文件

  Java代码

  cp /home/hadoop/hadoop104/src/hdfs/hdfs-default.xml /home/hadoop/hadoop104/conf/hdfs-site.xml

  cp /home/hadoop/hadoop104/src/hdfs/hdfs-default.xml /home/hadoop/hadoop104/conf/hdfs-site.xml

  配置文件中的dfs.name.dir目录,默认是在/tmp目录下,linux系统重启时可能会造成临时目录的文件丢失,

  所以可以做如下两处修改:

  Xml代码

  <property>

  <name>dfs.name.dir</name>

  <value>${hadoop.tmp.dir}/dfs/name</value>

  <description>Determines where on the local filesystem the DFS name node

  should store the name table(fsimage). If this is a comma-delimited list

  of directories then the name table is replicated in all of the

  directories, for redundancy. </description>

  </property>

  <property>

  <name>dfs.name.dir</name>

  <value>${hadoop.tmp.dir}/dfs/name</value>

  <description>Determines where on the local filesystem the DFS name node

  should store the name table(fsimage). If this is a comma-delimited list

  of directories then the name table is replicated in all of the

  directories, for redundancy. </description>

  </property>修改为:

  Xml代码

  <property>

  <name>dfs.name.dir</name>

  <value>/hadoopdata/dfs/name</value>

  <description>Determines where on the local filesystem the DFS name node

  should store the name table(fsimage). If this is a comma-delimited list

  of directories then the name table is replicated in all of the

  directories, for redundancy. </description>

  </property>

  <property>

  <name>dfs.name.dir</name>

  <value>/hadoopdata/dfs/name</value>

  <description>Determines where on the local filesystem the DFS name node

  should store the name table(fsimage). If this is a comma-delimited list

  of directories then the name table is replicated in all of the

  directories, for redundancy. </description>

  </property>还有:

  Xml代码

  <property>

  <name>dfs.data.dir</name>

  <value>${hadoop.tmp.dir}/dfs/data</value>

  <description>Determines where on the local filesystem an DFS data node

  should store its blocks. If this is a comma-delimited

  list of directories, then data will be stored in all named

  directories, typically on different devices.

  Directories that do not exist are ignored.

  </description>

  </property>

  <property>

  <name>dfs.data.dir</name>

  <value>${hadoop.tmp.dir}/dfs/data</value>

  <description>Determines where on the local filesystem an DFS data node

  should store its blocks. If this is a comma-delimited

  list of directories, then data will be stored in all named

  directories, typically on different devices.

  Directories that do not exist are ignored.

  </description>

  </property> 修改为:

  Xml代码

  <property>

  <name>dfs.data.dir</name>

  <value>/hadoopdata/dfs/data</value>

  <description>Determines where on the local filesystem an DFS data node

  should store its blocks. If this is a comma-delimited

  list of directories, then data will be stored in all named

  directories, typically on different devices.

  Directories that do not exist are ignored.

  </description>

  </property>

  <property>

  <name>dfs.data.dir</name>

  <value>/hadoopdata/dfs/data</value>

  <description>Determines where on the local filesystem an DFS data node

  should store its blocks. If this is a comma-delimited

  list of directories, then data will be stored in all named

  directories, typically on different devices.

  Directories that do not exist are ignored.

  </description>

  </property>

  注1:因为修改成了/hadoopdata/dfs/name 目录,所以需要先建立一下该目录才可使用,可通过下面的命令,分别在四台虚拟机上创建对应目录

  Python代码

  <SPAN style="FONT-SIZE: 12px">sudo mkdir /hadoopdata (创建/hadoopdata目录)

  sudo chown hadoop /hadoopdata/ (修改/hadoopdata目录的所属人,使得hadoop用户可以修改)

  mkdir /hadoopdata/dfs (创建/hadoopdata/dfs目录)

  mkdir /hadoopdata/dfs/name (创建/hadoopdata/dfs/name目录)

  </SPAN>

  sudo mkdir /hadoopdata (创建/hadoopdata目录)

  sudo chown hadoop /hadoopdata/ (修改/hadoopdata目录的所属人,使得hadoop用户可以修改)

  mkdir /hadoopdata/dfs (创建/hadoopdata/dfs目录)

  mkdir /hadoopdata/dfs/name (创建/hadoopdata/dfs/name目录)

  四台虚拟机上都分别进行上面操作.

  注2: 另外还有,在配置文件中dfs.replication的值,hadoop默认设置为3(文件块备份份数),因为在本文中使用的也是3个数据节点,所有不用修改了。

  (6)修改mapred-site.xml

  mapred-site.xml是mapreduce的配置文件,配置的是jobtracker的地址和端口

  Xml代码

  <configuration>

  <property>

  <name>mapred.job.tracker</name>

  <value>NameNode1:9001</value>

  </property>

  </configuration>

  <configuration>

  <property>

  <name>mapred.job.tracker</name>

  <value>NameNode1:9001</value>

  </property>

  </configuration>

  (7) 将hadoop部署到其他机器上,保证目录结构一致

  Python代码

  scp -r /home/hadoop/hadoop104 DataNode01:/home/hadoop

  scp -r /home/hadoop/hadoop104 NN02:/home/hadoop

  scp -r /home/hadoop/hadoop104 DN02:/home/hadoop

  scp -r /home/hadoop/hadoop104 DataNode01:/home/hadoop

  scp -r /home/hadoop/hadoop104 NN02:/home/hadoop

  scp -r /home/hadoop/hadoop104 DN02:/home/hadoop

  等全部传送完毕后,hadoop部署配置算是基本完成

  (8)在启动hadoop之前,需要先格式化namenode,先进入hadoop的安装目录,即 /home/hadoop/hadoop104/ 执行下面的命令:

  Python代码

  bin/hadoop namenode -format

  bin/hadoop namenode -format 看到类似下面的结果表明格式完毕:

  Java代码

  输出结果如下:

  13/02/21 22:10:00 INFO util.GSet: VM type = 64-bit

  13/02/21 22:10:00 INFO util.GSet: 2% max memory = 19.33375 MB

  13/02/21 22:10:00 INFO util.GSet: capacity = 2^21 = 2097152 entries

  13/02/21 22:10:00 INFO util.GSet: recommended=2097152, actual=2097152

  13/02/21 22:10:01 INFO namenode.FSNamesystem: fsOwner=hadoop

  13/02/21 22:10:01 INFO namenode.FSNamesystem: supergroup=supergroup

  13/02/21 22:10:01 INFO namenode.FSNamesystem: isPermissionEnabled=true

  13/02/21 22:10:01 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100

  13/02/21 22:10:01 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)

  13/02/21 22:10:01 INFO namenode.NameNode: Caching file names occuring more than 10 times

  13/02/21 22:10:01 INFO common.Storage: Image file of size 112 saved in 0 seconds.

  13/02/21 22:10:01 INFO common.Storage: Storage directory /hadoopdata/dfs/name has been successfully formatted.

  13/02/21 22:10:01 INFO namenode.NameNode: SHUTDOWN_MSG:

  /************************************************************

  SHUTDOWN_MSG: Shutting down NameNode at NameNode01/192.168.0.111

  ************************************************************/

  输出结果如下:

  13/02/21 22:10:00 INFO util.GSet: VM type = 64-bit

  13/02/21 22:10:00 INFO util.GSet: 2% max memory = 19.33375 MB

  13/02/21 22:10:00 INFO util.GSet: capacity = 2^21 = 2097152 entries

  13/02/21 22:10:00 INFO util.GSet: recommended=2097152, actual=2097152

  13/02/21 22:10:01 INFO namenode.FSNamesystem: fsOwner=hadoop

  13/02/21 22:10:01 INFO namenode.FSNamesystem: supergroup=supergroup

  13/02/21 22:10:01 INFO namenode.FSNamesystem: isPermissionEnabled=true

  13/02/21 22:10:01 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100

  13/02/21 22:10:01 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)

  13/02/21 22:10:01 INFO namenode.NameNode: Caching file names occuring more than 10 times

  13/02/21 22:10:01 INFO common.Storage: Image file of size 112 saved in 0 seconds.

  13/02/21 22:10:01 INFO common.Storage: Storage directory /hadoopdata/dfs/name has been successfully formatted.

  13/02/21 22:10:01 INFO namenode.NameNode: SHUTDOWN_MSG:

  /************************************************************

  SHUTDOWN_MSG: Shutting down NameNode at NameNode01/192.168.0.111

  ************************************************************/

  如果前面配置没有出差错,格式化应该会成功,如果由于各种原因导致格式化失败,就去hadoop104/logs下查看日志文件,如果之前格式化过,如果不能再次格式化,可能需要删除掉 /tmp, /data目录下的文件才可以再次格式化。

  3.测试

  前面准备工作完成后,就可以启动hadoop了,在/home/hadoop/hadoop104/bin目录下有几种启动脚本,简要说明如下:

  start-all.sh 启动所有的Hadoop守护。包括namenode, datanode, jobtracker, tasktrack

  stop-all.sh 停止所有的Hadoop

  start-mapred.sh 启动Map/Reduce守护。包括Jobtracker和Tasktrack

  stop-mapred.sh 停止Map/Reduce守护

  start-dfs.sh 启动Hadoop DFS守护.Namenode和Datanode

  stop-dfs.sh 停止DFS守护

  下面启动所有守护进程:

  Python代码

  hadoop@NameNode01:~/hadoop104$ bin/start-all.sh

  hadoop@NameNode01:~/hadoop104$ bin/start-all.sh 输出结果如下:

  Python代码

  starting namenode, logging to /home/hadoop/hadoop104/libexec/../logs/hadoop-hadoop-namenode-NameNode01.out

  NN02: starting datanode, logging to /home/hadoop/hadoop104/libexec/../logs/hadoop-hadoop-datanode-NN02.out

  DN02: starting datanode, logging to /home/hadoop/hadoop104/libexec/../logs/hadoop-hadoop-datanode-DN02.out

  DataNode01: starting datanode, logging to /home/hadoop/hadoop104/libexec/../logs/hadoop-hadoop-datanode-DataNode01.out

  NN02: starting secondarynamenode, logging to /home/hadoop/hadoop104/libexec/../logs/hadoop-hadoop-secondarynamenode-NN02.out

  starting jobtracker, logging to /home/hadoop/hadoop104/libexec/../logs/hadoop-hadoop-jobtracker-NameNode01.out

  DataNode01: starting tasktracker, logging to /home/hadoop/hadoop104/libexec/../logs/hadoop-hadoop-tasktracker-DataNode01.out

  DN02: starting tasktracker, logging to /home/hadoop/hadoop104/libexec/../logs/hadoop-hadoop-tasktracker-DN02.out

  NN02: starting tasktracker, logging to /home/hadoop/hadoop104/libexec/../logs/hadoop-hadoop-tasktracker-NN02.out

  starting namenode, logging to /home/hadoop/hadoop104/libexec/../logs/hadoop-hadoop-namenode-NameNode01.out

  NN02: starting datanode, logging to /home/hadoop/hadoop104/libexec/../logs/hadoop-hadoop-datanode-NN02.out

  DN02: starting datanode, logging to /home/hadoop/hadoop104/libexec/../logs/hadoop-hadoop-datanode-DN02.out

  DataNode01: starting datanode, logging to /home/hadoop/hadoop104/libexec/../logs/hadoop-hadoop-datanode-DataNode01.out

  NN02: starting secondarynamenode, logging to /home/hadoop/hadoop104/libexec/../logs/hadoop-hadoop-secondarynamenode-NN02.out

  starting jobtracker, logging to /home/hadoop/hadoop104/libexec/../logs/hadoop-hadoop-jobtracker-NameNode01.out

  DataNode01: starting tasktracker, logging to /home/hadoop/hadoop104/libexec/../logs/hadoop-hadoop-tasktracker-DataNode01.out

  DN02: starting tasktracker, logging to /home/hadoop/hadoop104/libexec/../logs/hadoop-hadoop-tasktracker-DN02.out

  NN02: starting tasktracker, logging to /home/hadoop/hadoop104/libexec/../logs/hadoop-hadoop-tasktracker-NN02.out

  可以从输出上看到各个节点所启动的进程,其中NameNode01自身启动了namenode一个进程,NN02包括datanode,secondarynamenode,tasktracker三个进程;DN02和DataNode01包括datanode和tasktracker两个进程。以上个Ubuntu节点的进程情况也登录到具体的Ubuntu机器,通过输入jps命令查看。

  查看集群的状态执行:

  Python代码

  bin/hadoop dfsadmin -report

  bin/hadoop dfsadmin -report 显示输出如下:

  Python代码

  Configured Capacity: 155743215616 (145.05 GB)

  Present Capacity: 134797897728 (125.54 GB)

  DFS Remaining: 134797811712 (125.54 GB)

  DFS Used: 86016 (84 KB)

  DFS Used%: 0%

  Under replicated blocks: 0

  Blocks with corrupt replicas: 0

  Missing blocks: 0

  -------------------------------------------------

  Datanodes available: 3 (3 total, 0 dead)

  Name: 192.168.0.112:50010

  Decommission Status : Normal

  Configured Capacity: 52506136576 (48.9 GB)

  DFS Used: 28672 (28 KB)

  Non DFS Used: 6880530432 (6.41 GB)

  DFS Remaining: 45625577472(42.49 GB)

  DFS Used%: 0%

  DFS Remaining%: 86.9%

  Last contact: Thu Feb 21 23:06:57 CST 2013

  Name: 192.168.0.114:50010

  Decommission Status : Normal

  Configured Capacity: 52506136576 (48.9 GB)

  DFS Used: 28672 (28 KB)

  Non DFS Used: 6881538048 (6.41 GB)

  DFS Remaining: 45624569856(42.49 GB)

  DFS Used%: 0%

  DFS Remaining%: 86.89%

  Last contact: Thu Feb 21 23:06:57 CST 2013

  Name: 192.168.0.113:50010

  Decommission Status : Normal

  Configured Capacity: 50730942464 (47.25 GB)

  DFS Used: 28672 (28 KB)

  Non DFS Used: 7183249408 (6.69 GB)

  DFS Remaining: 43547664384(40.56 GB)

  DFS Used%: 0%

  DFS Remaining%: 85.84%

  Last contact: Thu Feb 21 23:06:57 CST 2013

  Configured Capacity: 155743215616 (145.05 GB)

  Present Capacity: 134797897728 (125.54 GB)

  DFS Remaining: 134797811712 (125.54 GB)

  DFS Used: 86016 (84 KB)

  DFS Used%: 0%

  Under replicated blocks: 0

  Blocks with corrupt replicas: 0

  Missing blocks: 0

  -------------------------------------------------

  Datanodes available: 3 (3 total, 0 dead)

  Name: 192.168.0.112:50010

  Decommission Status : Normal

  Configured Capacity: 52506136576 (48.9 GB)

  DFS Used: 28672 (28 KB)

  Non DFS Used: 6880530432 (6.41 GB)

  DFS Remaining: 45625577472(42.49 GB)

  DFS Used%: 0%

  DFS Remaining%: 86.9%

  Last contact: Thu Feb 21 23:06:57 CST 2013

  Name: 192.168.0.114:50010

  Decommission Status : Normal

  Configured Capacity: 52506136576 (48.9 GB)

  DFS Used: 28672 (28 KB)

  Non DFS Used: 6881538048 (6.41 GB)

  DFS Remaining: 45624569856(42.49 GB)

  DFS Used%: 0%

  DFS Remaining%: 86.89%

  Last contact: Thu Feb 21 23:06:57 CST 2013

  Name: 192.168.0.113:50010

  Decommission Status : Normal

  Configured Capacity: 50730942464 (47.25 GB)

  DFS Used: 28672 (28 KB)

  Non DFS Used: 7183249408 (6.69 GB)

  DFS Remaining: 43547664384(40.56 GB)

  DFS Used%: 0%

  DFS Remaining%: 85.84%

  Last contact: Thu Feb 21 23:06:57 CST 2013 还可以查看http://192.168.0.111:50070 或者 http://NameNode01:50070 通过网页查看集群状态。

  若要停止hadoop,需要执行:

  Python代码

  bin/stop-all.sh

  bin/stop-all.sh 到目前为止,hadoop初步的安装配置及运行说明完毕


版权所有:转载请注明出处!
分享到:
上一篇解决 linux ubuntu下无线连接不稳.. 下一篇Ubuntu 12.04.01 安装ssh
您可能还喜欢
今日最新资讯 最新推荐信息
考试认证 更多
开发技术 更多
移动开发 更多

关于开发界 | 合作伙伴 | 联系我们 | 友情链接 | 版权声明 | 网站制作 | 网站地图 | 加入收藏 | 设为首页

Copyright@2012-2016 开发界 京ICP备12027873号