kaiser_cn 发表于 2018-10-31 08:53:25

国内第一篇详细讲解hadoop2的automatic HA+Federation+Yarn的教程(2)

11.把NameNode的数据从hadoop103同步到hadoop104中
  在hadoop104上执行命令:/usr/local/hadoop/bin/hdfs namenode –bootstrapStandby
  命令输出:
  # /usr/local/hadoop/bin/hdfs namenode -bootstrapStandby
  14/02/12 08:28:30INFO namenode.NameNode: STARTUP_MSG:
  /************************************************************
  STARTUP_MSG: Starting NameNode
  STARTUP_MSG:   host = hadoop104/192.168.80.104
  STARTUP_MSG:   args = [-bootstrapStandby]
  STARTUP_MSG:   version = 2.2.0

  STARTUP_MSG:>  STARTUP_MSG:   build = Unknown -r Unknown; compiled by 'root' on 2013-12-26T08:50Z
  STARTUP_MSG:   java = 1.7.0_45
  ************************************************************/
  14/02/12 08:28:35 INFO namenode.NameNode: registered UNIX signal handlers for
  =====================================================

  About to bootstrap Standby>
  Nameservice>
  Other Namenode>  Other NN's HTTP address: hadoop103:50070
  Other NN's IPCaddress: hadoop103/192.168.80.103:9000

  Namespace>
  Block pool>
  Cluster>  Layout version: -47
  =====================================================
  14/02/12 08:28:39 INFO common.Storage: Storage directory /usr/local/hadoop/tmp/dfs/name has been successfully formatted.
  14/02/12 08:28:39 INFO namenode.TransferFsImage: Opening connection tohttp://hadoop103:50070/getimage?getimage=1&txid=0&storageInfo=-47:698609742:0:c2
  14/02/12 08:28:40 INFO namenode.TransferFsImage: Transfer took 0.67s at 0.00 KB/s

  14/02/12 08:28:40 INFO namenode.TransferFsImage: Downloaded file fsimage.ckpt_0000000000000000000>  14/02/12 08:28:40 INFO util.ExitUtil: Exiting with status 0
  14/02/12 08:28:40 INFO namenode.NameNode: SHUTDOWN_MSG:
  /************************************************************
  SHUTDOWN_MSG: Shutting down NameNode at hadoop104/192.168.80.104
  ************************************************************/
  验证:
  # pwd
  /usr/local/hadoop
  # ls tmp/
  dfs
  # ls tmp/dfs/
  name
  #
12.启动c2中另一个Namenode
  在hadoop104上执行命令:/usr/local/hadoop/sbin/hadoop-daemon.sh start namenode
  命令输出:
  # /usr/local/hadoop/sbin/hadoop-daemon.sh start namenode
  starting namenode, logging to /usr/local/hadoop/logs/hadoop-root-namenode-hadoop104.out
  #
  验证:
  # jps
  8822 NameNode
  8975 Jps
  #
  也可以通过浏览器访问http://hadoop104:50070,可以看到如上图页面,此处省略截图。
13.启动所有的DataNode
  在hadoop101上执行命令:/usr/local/hadoop/sbin/hadoop-daemons.sh start datanode
  命令输出:
  # /usr/local/hadoop/sbin/hadoop-daemons.sh start datanode
  hadoop101: starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-hadoop101.out
  hadoop103: starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-hadoop103.out
  hadoop102: starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-hadoop102.out
  hadoop104: starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-hadoop104.out
  #
  【上述命令会在四个节点分别启动DataNode进程】
  验证(以hadoop101为例):
  # jps
  23396 JournalNode
  24302 Jps
  24232 DataNode
  23558 NameNode
  22491 QuorumPeerMain
  #
  【可以看到java进程DataNode】
14.启动Yarn
  在hadoop101上执行命令:/usr/local/hadoop/sbin/start-yarn.sh
  命令输出:
  # /usr/local/hadoop/sbin/start-yarn.sh
  starting yarn daemons
  starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-root-resourcemanager-hadoop101.out
  hadoop104: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-root-nodemanager-hadoop104.out
  hadoop103: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-root-nodemanager-hadoop103.out
  hadoop102: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-root-nodemanager-hadoop102.out
  hadoop101: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-root-nodemanager-hadoop101.out
  #
  验证:
  # jps
  23396 JournalNode
  25154 ResourceManager
  25247 NodeManager
  24232 DataNode
  23558 NameNode
  22491 QuorumPeerMain
  25281 Jps
  #
  【产生java进程ResourceManager和NodeManager】
  也可以通过浏览器访问,如下图
http://www.superwu.cn/wp-content/uploads/2014/02/image_thumb2.png
15.启动ZooKeeperFailoverController
  在hadoop101、hadoop102、hadoop103、hadoop104上分别执行命令:/usr/local/hadoop/sbin/hadoop-daemon.sh start zkfc
  命令输出(以hadoop101为例):
  # /usr/local/hadoop/sbin/hadoop-daemon.sh start zkfc
  starting zkfc, logging to /usr/local/hadoop/logs/hadoop-root-zkfc-hadoop101.out
  #
  验证(以hadoop101为例):
  # jps
  24599 DFSZKFailoverController
  23396 JournalNode
  24232 DataNode
  23558 NameNode
  22491 QuorumPeerMain
  24654 Jps
  #
  【产生java进程DFSZKFailoverController】
16.验证HDFS是否好用
  在任意一个节点上执行以下命令(这里以hadoop101为例),把数据上传到HDFS集群中
  # pwd
  /usr/local/hadoop/etc/hadoop
  # ls
  capacity-scheduler.xml      hadoop-metrics.propertieshttpfs-site.xml             ssl-server.xml.example
  configuration.xsl         hadoop-policy.xml          log4j.properties            startall.sh
  container-executor.cfg      hdfs2-site.xml             mapred-env.sh               yarn-env.sh
  core-site.xml               hdfs-site.xml            mapred-queues.xml.templateyarn-site.xml
  fairscheduler.xml         httpfs-env.sh            mapred-site.xml             zookeeper.out
  hadoop-env.sh               httpfs-log4j.properties    slaves
  hadoop-metrics2.propertieshttpfs-signature.secret    ssl-client.xml.example
  # hadoop fs -put core-site.xml /
  【上传到集群中,默认是上传到HDFS联盟的c1集群中】
  验证:
  # hadoop fs -ls /
  Found 1 items
  -rw-r--r--   2 root supergroup      446 2014-02-12 09:00 /core-site.xml
  #
  也可以通过浏览器查看,数据默认是放在第一个集群中的
http://www.superwu.cn/wp-content/uploads/2014/02/image_thumb3.png
17.验证Yarn是否好用
  在hadoop101上执行以下命令 hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar   wordcount /core-site.xml /out
  命令输出:
  # hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar   wordcount /core-site.xml /out
  14/02/12 11:43:55 INFO client.RMProxy: Connecting to ResourceManager at hadoop101/192.168.80.101:8032
  14/02/12 11:43:59 INFO input.FileInputFormat: Total input paths to process : 1
  14/02/12 11:43:59 INFO mapreduce.JobSubmitter: number of splits:1
  14/02/12 11:43:59 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name
  14/02/12 11:43:59 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
  14/02/12 11:43:59 INFO Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
  14/02/12 11:43:59 INFO Configuration.deprecation: mapreduce.combine.class is deprecated. Instead, use mapreduce.job.combine.class
  14/02/12 11:43:59 INFO Configuration.deprecation: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class
  14/02/12 11:43:59 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name
  14/02/12 11:43:59 INFO Configuration.deprecation: mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class
  14/02/12 11:43:59 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
  14/02/12 11:43:59 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
  14/02/12 11:43:59 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
  14/02/12 11:43:59 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
  14/02/12 11:43:59 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
  14/02/12 11:44:01 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1392169506119_0002
  14/02/12 11:44:04 INFO impl.YarnClientImpl: Submitted application application_1392169506119_0002 to ResourceManager at hadoop101/192.168.80.101:8032
  14/02/12 11:44:05 INFO mapreduce.Job: The url to track the job:http://hadoop101:8088/proxy/application_1392169506119_0002/
  14/02/12 11:44:05 INFO mapreduce.Job: Running job: job_1392169506119_0002
  14/02/12 11:44:41 INFO mapreduce.Job: Job job_1392169506119_0002 running in uber mode : false
  14/02/12 11:44:41 INFO mapreduce.Job:map 0% reduce 0%
  14/02/12 11:45:37 INFO mapreduce.Job:map 100% reduce 0%
  14/02/12 11:46:54 INFO mapreduce.Job:map 100% reduce 100%
  14/02/12 11:47:01 INFO mapreduce.Job: Job job_1392169506119_0002 completed successfully
  14/02/12 11:47:02 INFO mapreduce.Job: Counters: 43
  File System Counters
  FILE: Number of bytes read=472
  FILE: Number of bytes written=164983
  FILE: Number of read operations=0
  FILE: Number of large read operations=0
  FILE: Number of write operations=0
  HDFS: Number of bytes read=540
  HDFS: Number of bytes written=402
  HDFS: Number of read operations=6
  HDFS: Number of large read operations=0
  HDFS: Number of write operations=2
  Job Counters
  Launched map tasks=1
  Launched reduce tasks=1
  Data-local map tasks=1
  Total time spent by all maps in occupied slots (ms)=63094
  Total time spent by all reduces in occupied slots (ms)=57228
  Map-Reduce Framework
  Map input records=17
  Map output records=20
  Map output bytes=496
  Map output materialized bytes=472
  Input split bytes=94
  Combine input records=20
  Combine output records=16
  Reduce input groups=16
  Reduce shuffle bytes=472
  Reduce input records=16
  Reduce output records=16
  Spilled Records=32
  Shuffled Maps =1
  Failed Shuffles=0
  Merged Map outputs=1
  GC time elapsed (ms)=632
  CPU time spent (ms)=3010
  Physical memory (bytes) snapshot=255528960
  Virtual memory (bytes) snapshot=1678471168
  Total committed heap usage (bytes)=126660608
  Shuffle Errors
  BAD_ID=0
  CONNECTION=0
  IO_ERROR=0
  WRONG_LENGTH=0
  WRONG_MAP=0
  WRONG_REDUCE=0
  File Input Format Counters
  Bytes Read=446
  File Output Format Counters
  Bytes Written=402
  #
  验证:
  # hadoop fs -ls /out
  Found 2 items
  -rw-r--r--   2 root supergroup          0 2014-02-12 11:46 /out/_SUCCESS
  -rw-r--r--   2 root supergroup      402 2014-02-12 11:46 /out/part-r-00000
  # hadoop fs -text /out/part-r-00000
        1
     3
        1
  type="text/xsl" 1
  version="1.0"   1
  #
18.验证HA的故障自动转移是否好用
  观察cluster1的两个NameNode的状态,hadoop101的状态是standby,hadoop102的状态是active,如下图。
http://www.superwu.cn/wp-content/uploads/2014/02/image_thumb4.png
http://www.superwu.cn/wp-content/uploads/2014/02/image_thumb5.png
  下面我们杀死hadoop102的NameNode进程,观察hadoop101的状态是否会自动切换成active。
  在hadoop102执行以下命令
  # jps
  13389 DFSZKFailoverController
  12355 JournalNode
  13056 DataNode
  15660 Jps
  14496 NodeManager
  12573 NameNode
  12081 QuorumPeerMain
  # kill -9 12573
  # jps
  13389 DFSZKFailoverController
  12355 JournalNode
  13056 DataNode
  14496 NodeManager
  15671 Jps
  12081 QuorumPeerMain
  #
  再观察页面,发现如下图所示
http://www.superwu.cn/wp-content/uploads/2014/02/image_thumb6.png
http://www.superwu.cn/wp-content/uploads/2014/02/image_thumb7.png
  证明HDFS的高可靠是可用的。

结语
  以上是hadoop2.2.0的HDFS集群HA配置和自动切换、HDFS federation配置、Yarn配置的详细过程,大家可以根据以上步骤搭建。在搭建过程中,一定要注意命令的执行顺序和每一步的验证工作。
  对于以上的这套安装步骤,是作者在尝试很多遍不同配置方法之后总结出来的,尽可能的减少配置参数,并对其中的各项配置参数进行了详细注释,帮助大家了解搭建原理。
  对于linux环境服务器如何准备、hadoop2.2.0在64位平台如何编译源代码、以及操作,请关注吴超沉思录,近期会推出视频教程,更加详细,更加完整,更加实用。
  本教程是作者一边执行步骤、一边录制脚本、一边截图、一边写文档,非常耗费精力,花费约六个小时完成的。请尊重作者劳动,转载注明出处。

页: [1]
查看完整版本: 国内第一篇详细讲解hadoop2的automatic HA+Federation+Yarn的教程(2)