xiaochuan 发表于 2018-10-29 06:04:38

Hadoop集群(四) Hadoop升级

  Hadoop前面安装的集群是2.6版本,现在升级到2.7版本。
  注意,这个集群上有运行Hbase,所以,升级前后,需要启停Hbase。
  更多安装步骤,请参考:
  Hadoop集群(一) Zookeeper搭建
  Hadoop集群(二) HDFS搭建
  Hadoop集群(三) Hbase搭建
  升级步骤如下:
  集群IP列表
Namenode:  
192.168.143.46
  
192.168.143.103
  
Journalnode:
  
192.168.143.101
  
192.168.143.102
  
192.168.143.103
  
Datanode&Hbase regionserver:
  
192.168.143.196
  
192.168.143.231
  
192.168.143.182
  
192.168.143.235
  
192.168.143.41
  
192.168.143.127
  
Hbase master:
  
192.168.143.103
  
192.168.143.101
  
Zookeeper:
  
192.168.143.101
  
192.168.143.102
  
192.168.143.103
  1. 首先确定hadoop运行的路径,将新版本的软件分发到每个节点的这个路径下,并解压。
# ll /usr/local/hadoop/  
total 493244
  
drwxrwxr-x 9 root root      4096 Mar 212017 hadoop-release ->hadoop-2.6.0-EDH-0u1-SNAPSHOT-HA-SECURITY
  
drwxr-xr-x 9 root root      4096 Oct 11 11:06 hadoop-2.7.1
  
-rw-r--r-- 1 root root 194690531 Oct9 10:55 hadoop-2.7.1.tar.gz
  
drwxrwxr-x 7 root root      4096 May 212016 hbase-1.1.3
  
-rw-r--r-- 1 root root 128975247 Apr 102017 hbase-1.1.3.tar.gz
  
lrwxrwxrwx 1 root root      29 Apr 102017 hbase-release -> /usr/local/hadoop/hbase-1.1.3
  由于是升级,配置文件完全不变,将原hadoop-2.6.0下的etc/hadoop路径完全拷贝/替换到hadoop-2.7.1下。
  至此,升级前的准备就已经完成了。
  下面开始升级操作过程。全程都是在一个中转机上执行的命令,通过shell脚本执行,省去频繁ssh登陆的操作。
  ## 停止hbase,hbase用户执行
  2. 停止Hbase master,hbase用户执行
  状态检查,确认master,先停standby master
http://192.168.143.101:16010/master-statusmaster:  
ssh -t -q 192.168.143.103sudo su -l hbase -c "/usr/local/hadoop/hbase-release/bin/hbase-daemon.sh\ stop\ master"
  
ssh -t -q 192.168.143.103sudo su -l hbase -c "jps"
  
ssh -t -q 192.168.143.101sudo su -l hbase -c "/usr/local/hadoop/hbase-release/bin/hbase-daemon.sh\ stop\ master"
  
ssh -t -q 192.168.143.101sudo su -l hbase -c "jps"
  3. 停止Hbase regionserver,hbase用户执行
ssh -t -q 192.168.143.196sudo su -l hbase -c "/usr/local/hadoop/hbase-release/bin/hbase-daemon.sh\ stop\ regionserver"  
ssh -t -q 192.168.143.231sudo su -l hbase -c "/usr/local/hadoop/hbase-release/bin/hbase-daemon.sh\ stop\ regionserver"
  
ssh -t -q 192.168.143.182sudo su -l hbase -c "/usr/local/hadoop/hbase-release/bin/hbase-daemon.sh\ stop\ regionserver"
  
ssh -t -q 192.168.143.235sudo su -l hbase -c "/usr/local/hadoop/hbase-release/bin/hbase-daemon.sh\ stop\ regionserver"
  
ssh -t -q 192.168.143.41   sudo su -l hbase -c "/usr/local/hadoop/hbase-release/bin/hbase-daemon.sh\ stop\ regionserver"
  
ssh -t -q 192.168.143.127sudo su -l hbase -c "/usr/local/hadoop/hbase-release/bin/hbase-daemon.sh\ stop\ regionserver"
  检查运行状态
ssh -t -q 192.168.143.196sudo su -l hbase -c "jps"  
ssh -t -q 192.168.143.231sudo su -l hbase -c "jps"
  
ssh -t -q 192.168.143.182sudo su -l hbase -c "jps"
  
ssh -t -q 192.168.143.235sudo su -l hbase -c "jps"
  
ssh -t -q 192.168.143.41   sudo su -l hbase -c "jps"
  
ssh -t -q 192.168.143.127sudo su -l hbase -c "jps"
  ## 停止服务--HDFS
  4. 先确认,active的namenode,网页确认.后续要先启动这个namenode
https://192.168.143.46:50470/dfshealth.html#tab-overview  5. 停止NameNode,hdfs用户执行
  NN: 先停standby namenode
ssh -t -q 192.168.143.103sudo su -l hdfs -c "/usr/local/hadoop/hadoop-release/sbin/hadoop-daemon.sh\ stop\ namenode"  
ssh -t -q 192.168.143.46   sudo su -l hdfs -c "/usr/local/hadoop/hadoop-release/sbin/hadoop-daemon.sh\ stop\ namenode"
  
检查状态
  
ssh -t -q 192.168.143.103sudo su -l hdfs -c "jps"
  
ssh -t -q 192.168.143.46   sudo su -l hdfs -c "jps"
  6. 停止DataNode,hdfs用户执行
ssh -t -q 192.168.143.196sudo su -l hdfs -c "/usr/local/hadoop/hadoop-release/sbin/hadoop-daemon.sh\ stop\ datanode"  
ssh -t -q 192.168.143.231sudo su -l hdfs -c "/usr/local/hadoop/hadoop-release/sbin/hadoop-daemon.sh\ stop\ datanode"
  
ssh -t -q 192.168.143.182sudo su -l hdfs -c "/usr/local/hadoop/hadoop-release/sbin/hadoop-daemon.sh\ stop\ datanode"
  
ssh -t -q 192.168.143.235sudo su -l hdfs -c "/usr/local/hadoop/hadoop-release/sbin/hadoop-daemon.sh\ stop\ datanode"
  
ssh -t -q 192.168.143.41   sudo su -l hdfs -c "/usr/local/hadoop/hadoop-release/sbin/hadoop-daemon.sh\ stop\ datanode"
  
ssh -t -q 192.168.143.127sudo su -l hdfs -c "/usr/local/hadoop/hadoop-release/sbin/hadoop-daemon.sh\ stop\ datanode"
  7. 停止ZKFC,hdfs用户执行
ssh -t -q 192.168.143.46   sudo su -l hdfs -c "/usr/local/hadoop/hadoop-release/sbin/hadoop-daemon.sh\ stop\ zkfc"  
ssh -t -q 192.168.143.103sudo su -l hdfs -c "/usr/local/hadoop/hadoop-release/sbin/hadoop-daemon.sh\ stop\ zkfc"
  8.停止JournalNode,hdfs用户执行
JN:  
ssh -t -q 192.168.143.101sudo su -l hdfs -c "/usr/local/hadoop/hadoop-release/sbin/hadoop-daemon.sh\ stop\ journalnode"
  
ssh -t -q 192.168.143.102sudo su -l hdfs -c "/usr/local/hadoop/hadoop-release/sbin/hadoop-daemon.sh\ stop\ journalnode"
  
ssh -t -q 192.168.143.103sudo su -l hdfs -c "/usr/local/hadoop/hadoop-release/sbin/hadoop-daemon.sh\ stop\ journalnode"
  ### 备份NameNode的数据,由于生产环境,原有的数据需要备份。以备升级失败回滚。
  9. 备份namenode1
ssh -t -q 192.168.143.46 "cp -r /data1/dfs/name    /data1/dfs/name.bak.20171011-2;ls -al /data1/dfs/;du -sm /data1/dfs/*"  
ssh -t -q 192.168.143.46 "cp -r /data2/dfs/name    /data2/dfs/name.bak.20171011-2;ls -al /data1/dfs/;du -sm /data1/dfs/*"
  10. 备份namenode2
ssh -t -q 192.168.143.103 "cp -r /data1/dfs/name  
/data1/dfs/name.bak.20171011-2;ls -al /data1/dfs/;du -sm /data1/dfs/*"
  11. 备份journal
ssh -t -q 192.168.143.101 "cp -r /data1/journalnode   /data1/journalnode.bak.20171011;ls -al /data1/dfs/;du -sm /data1/*"  
ssh -t -q 192.168.143.102 "cp -r /data1/journalnode   /data1/journalnode.bak.20171011;ls -al /data1/dfs/;du -sm /data1/*"
  
ssh -t -q 192.168.143.103 "cp -r /data1/journalnode   /data1/journalnode.bak.20171011;ls -al /data1/dfs/;du -sm /data1/*"
  journal路径,可以查看hdfs-site.xml文件
dfs.journalnode.edits.dir:  
/data1/journalnode
  ### 升级相关
  12. copy文件(已提前处理,参考第一步)
  切换软连接到2.7.1版本
ssh -t -q $h "cd /usr/local/hadoop; rm hadoop-release; ln -s hadoop-2.7.1 hadoop-release"  13. 切换文件软链接,root用户执行
ssh -t -q 192.168.143.46   "cd /usr/local/hadoop; rm hadoop-release; ln -s hadoop-2.7.1 hadoop-release"  
ssh -t -q 192.168.143.103   "cd /usr/local/hadoop; rm hadoop-release; ln -s hadoop-2.7.1 hadoop-release"
  
ssh -t -q 192.168.143.101   "cd /usr/local/hadoop; rm hadoop-release; ln -s hadoop-2.7.1 hadoop-release"
  
ssh -t -q 192.168.143.102   "cd /usr/local/hadoop; rm hadoop-release; ln -s hadoop-2.7.1 hadoop-release"
  
ssh -t -q 192.168.143.196   "cd /usr/local/hadoop; rm hadoop-release; ln -s hadoop-2.7.1 hadoop-release"
  
ssh -t -q 192.168.143.231   "cd /usr/local/hadoop; rm hadoop-release; ln -s hadoop-2.7.1 hadoop-release"
  
ssh -t -q 192.168.143.182   "cd /usr/local/hadoop; rm hadoop-release; ln -s hadoop-2.7.1 hadoop-release"
  
ssh -t -q 192.168.143.235   "cd /usr/local/hadoop; rm hadoop-release; ln -s hadoop-2.7.1 hadoop-release"
  
ssh -t -q 192.168.143.41    "cd /usr/local/hadoop; rm hadoop-release; ln -s hadoop-2.7.1 hadoop-release"
  
ssh -t -q 192.168.143.127   "cd /usr/local/hadoop; rm hadoop-release; ln -s hadoop-2.7.1 hadoop-release"
  确认状态
ssh -t -q 192.168.143.46    "cd /usr/local/hadoop; ls -al"  
ssh -t -q 192.168.143.103   "cd /usr/local/hadoop; ls -al"
  
ssh -t -q 192.168.143.101   "cd /usr/local/hadoop; ls -al"
  
ssh -t -q 192.168.143.102   "cd /usr/local/hadoop; ls -al"
  
ssh -t -q 192.168.143.196   "cd /usr/local/hadoop; ls -al"
  
ssh -t -q 192.168.143.231   "cd /usr/local/hadoop; ls -al"
  
ssh -t -q 192.168.143.182   "cd /usr/local/hadoop; ls -al"
  
ssh -t -q 192.168.143.235   "cd /usr/local/hadoop; ls -al"
  
ssh -t -q 192.168.143.41    "cd /usr/local/hadoop; ls -al"
  
ssh -t -q 192.168.143.127   "cd /usr/local/hadoop; ls -al"
  ### 启动HDFS,hdfs用户执行
  14. 启动JournalNode
JN:  
ssh -t -q 192.168.143.101sudo su -l hdfs -c "/usr/local/hadoop/hadoop-release/sbin/hadoop-daemon.sh\ start\ journalnode"
  
ssh -t -q 192.168.143.102sudo su -l hdfs -c "/usr/local/hadoop/hadoop-release/sbin/hadoop-daemon.sh\ start\ journalnode"
  
ssh -t -q 192.168.143.103sudo su -l hdfs -c "/usr/local/hadoop/hadoop-release/sbin/hadoop-daemon.sh\ start\ journalnode"
ssh -t -q 192.168.143.101sudo su -l hdfs -c "jps"  
ssh -t -q 192.168.143.102sudo su -l hdfs -c "jps"
  
ssh -t -q 192.168.143.103sudo su -l hdfs -c "jps"
  15. 启动第一个NameNode
ssh 192.168.143.46  
su - hdfs
  
/usr/local/hadoop/hadoop-release/sbin/hadoop-daemon.sh start namenode -upgrade
  16. 确认状态,在状态完全OK之后,才可以启动另一个namenode
https://192.168.143.46:50470/dfshealth.html#tab-overview  17. 启动第一个ZKFC
su - hdfs  
/usr/local/hadoop/hadoop-release/sbin/hadoop-daemon.sh start zkfc
  
192.168.143.46
  18. 启动第二个NameNode
ssh 192.168.143.103  
su - hdfs
  
/usr/local/hadoop/hadoop-release/bin/hdfs namenode -bootstrapStandby
  
/usr/local/hadoop/hadoop-release/sbin/hadoop-daemon.sh start namenode
  19. 启动第二个ZKFC
ssh 192.168.143.103  
su - hdfs
  
/usr/local/hadoop/hadoop-release/sbin/hadoop-daemon.sh start zkfc
  20. 启动DataNode
ssh -t -q 192.168.143.196sudo su -l hdfs -c "/usr/local/hadoop/hadoop-release/sbin/hadoop-daemon.sh\ start\ datanode"  
ssh -t -q 192.168.143.231sudo su -l hdfs -c "/usr/local/hadoop/hadoop-release/sbin/hadoop-daemon.sh\ start\ datanode"
  
ssh -t -q 192.168.143.182sudo su -l hdfs -c "/usr/local/hadoop/hadoop-release/sbin/hadoop-daemon.sh\ start\ datanode"
  
ssh -t -q 192.168.143.235sudo su -l hdfs -c "/usr/local/hadoop/hadoop-release/sbin/hadoop-daemon.sh\ start\ datanode"
  
ssh -t -q 192.168.143.41   sudo su -l hdfs -c "/usr/local/hadoop/hadoop-release/sbin/hadoop-daemon.sh\ start\ datanode"
  
ssh -t -q 192.168.143.127sudo su -l hdfs -c "/usr/local/hadoop/hadoop-release/sbin/hadoop-daemon.sh\ start\ datanode"
  确认状态
ssh -t -q 192.168.143.196sudo su -l hdfs -c "jps"  
ssh -t -q 192.168.143.231sudo su -l hdfs -c "jps"
  
ssh -t -q 192.168.143.182sudo su -l hdfs -c "jps"
  
ssh -t -q 192.168.143.235sudo su -l hdfs -c "jps"
  
ssh -t -q 192.168.143.41   sudo su -l hdfs -c "jps"
  
ssh -t -q 192.168.143.127sudo su -l hdfs -c "jps"
  21. 一切正常之后,启动hbase, hbase用户执行
  启动hbase master,最好先启动原来的active master。
ssh -t -q 192.168.143.101sudo su -l hbase -c "/usr/local/hadoop/hbase-release/bin/hbase-daemon.sh\ start\ master"  
ssh -t -q 192.168.143.103sudo su -l hbase -c "/usr/local/hadoop/hbase-release/bin/hbase-daemon.sh\ start\ master"
  启动Hbase regionserver
ssh -t -q 192.168.143.196sudo su -l hbase -c "/usr/local/hadoop/hbase-release/bin/hbase-daemon.sh\ start\ regionserver"  
ssh -t -q 192.168.143.231sudo su -l hbase -c "/usr/local/hadoop/hbase-release/bin/hbase-daemon.sh\ start\ regionserver"
  
ssh -t -q 192.168.143.182sudo su -l hbase -c "/usr/local/hadoop/hbase-release/bin/hbase-daemon.sh\ start\ regionserver"
  
ssh -t -q 192.168.143.235sudo su -l hbase -c "/usr/local/hadoop/hbase-release/bin/hbase-daemon.sh\ start\ regionserver"
  
ssh -t -q 192.168.143.41   sudo su -l hbase -c "/usr/local/hadoop/hbase-release/bin/hbase-daemon.sh\ start\ regionserver"
  
ssh -t -q 192.168.143.127sudo su -l hbase -c "/usr/local/hadoop/hbase-release/bin/hbase-daemon.sh\ start\ regionserver"
  22. Hbase region需要手动Balance开启、关闭
  需要登录HBase Shell运行如下命令
  开启
  balance_switch true
  关闭
  balance_switch false
  23. 本次不执行,系统运行一周,确保系统运行稳定,再执行Final。
  注意:这期间,磁盘空间可能会快速增长。在执行完final之后,会释放一部分空间。
  Finallize upgrade: hdfs dfsadmin -finalizeUpgrade


页: [1]
查看完整版本: Hadoop集群(四) Hadoop升级