木一 发表于 2018-10-31 09:11:10

hadoop2.2.0编译安装

  系统版本:CENTOS6.5 2.6.32-431.el6.X86_64
  1、安装JDK
  我这里用的是64位机,要下载对应的64位的JDK,下载地址:http://www.oracle.com/technetwork/cn/java/javase/downloads/jdk7-downloads-1880260-zhs.html,选择对应的JDK版本,解压JDK,然后配置环境变量
http://static.blog.csdn.net/scripts/ZeroClipboard/ZeroClipboard.swf  vi /etc/profile
  export PATH
  export JAVA_HOME=/opt/jdk1.7
  export PATH=$PATH:$JAVA_HOME/bin
  source /etc/profile
  测试下JDK是否安装成功: java -version
java version "1.7.0_45"  Java(TM) SE Runtime Environment (build 1.7.0_45-b18)
  Java HotSpot(TM) 64-Bit Server VM (build 24.45-b08, mixed mode)
  2、编译前的准备(maven)
  maven官方下载地址,可以选择源码编码安装,这里就直接下载编译好的 就可以了http://static.blog.csdn.net/scripts/ZeroClipboard/ZeroClipboard.swf
  wget http://mirror.bit.edu.cn/apache/maven/maven-3/3.1.1/binaries/apache-maven-3.1.1-bin.zip
  解压文件后,同样在/etc/profie里配置环境变量
export PATH=/usr/local/maven/bin:$PATH  验证配置是否成功: mvn --version
  Apache Maven 3.1.1 (0728685237757ffbf44136acec0402957f723d9a; 2013-09-17 23:22:22+0800)
  Maven home: /opt/maven3.1.1
  Java version: 1.7.0_45, vendor: Oracle Corporation
  Java home: /opt/jdk1.7/jre
  Default locale: en_US, platform encoding: UTF-8
  OS name: "linux", version: "2.6.32-358.el6.x86_64", arch: "amd64", family: "unix"
  3、编译hadoop(PS:期间碰到很多次read timeout,导致失败,每次都要clean后重编译,急死人了。。)
  wget http://mirrors.cnnic.cn/apache/hadoop/common/hadoop-2.2.0/hadoop-2.2.0-src.tar.gz
  如果是你32bit的机器,可以直接下载官方已经编译好的包,64bit的机子跑编译好的包跑不了。
  由于maven国外服务器可能连不上,先给maven配置一下国内镜像,在maven目录下:conf/settings.xml,在里添加,原本的不要动
http://static.blog.csdn.net/scripts/ZeroClipboard/ZeroClipboard.swf  
  nexus-osc
  *
  Nexusosc
  http://maven.oschina.net/content/groups/public/
  
  同样,在内新添加
http://static.blog.csdn.net/scripts/ZeroClipboard/ZeroClipboard.swf  
  jdk-1.7
  
  1.7
  
  
  
  nexus
  local private nexus
  http://maven.oschina.net/content/groups/public/
  
  true
  
  
  false
  
  
  
  
  
  nexus
  local private nexus
  http://maven.oschina.net/content/groups/public/
  
  true
  
  
  false
  
  
  
  
  编译clean
cd hadoop2.2.0-src  mvn clean install –DskipTests
  发现异常
Failed to execute goal org.apache.hadoop:hadoop-maven-plugins:2.2.0:protoc (compile-protoc) on project hadoop-common: org.apache.maven.plugin.MojoExecutionException: 'protoc --version' did not return a version ->   
   To see the full stack trace of the errors, re-run Maven with the -e switch.
   Re-run Maven using the -X switch to enable full debug logging.
  
   For more information about the errors and possible solutions, please read the following articles:
   http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
  
   After correcting the problems, you can resume the build with the command
     mvn-rf :hadoop-common
  hadoop2.2.0编译需要protoc2.5.0的支持,所以还要下载protoc,下载地址:https://code.google.com/p/protobuf/downloads/list,要下载2.5.0版本噢
  对protoc进行编译安装前先要装几个依赖包:gcc,gcc-c++,make 如果已经安装的可以忽略
http://static.blog.csdn.net/scripts/ZeroClipboard/ZeroClipboard.swf  yum install gcc
  yum intall gcc-c++
  yum install make
  安装protoc
tar -xvf protobuf-2.5.0.tar.bz2  cd protobuf-2.5.0
  ./configure --prefix=/opt/protoc/
  make && make install
  安装完配置下环境变量,就不多说了,跟上面过程一样。
  别急,还不要着急开始编译安装,不然又是各种错误,需要安装cmake,openssl-devel,ncurses-devel依赖
http://static.blog.csdn.net/scripts/ZeroClipboard/ZeroClipboard.swf  yum install cmake
  yum install openssl-devel
  yum install ncurses-devel
  如报以下错误请:
  Patch :https://issues.apache.org/jira/browse/HADOOP-10110
Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.5.1:testCompile (default-testCompile) on project hadoop-auth: Compilation failure: Compilation failure:  
/home/chuan/trunk/hadoop-common-project/hadoop-auth/src/test/java/org/apache/hadoop/security/authentication/client/AuthenticatorTestCase.java: cannot access org.mortbay.component.AbstractLifeCycle

  
>  ok,现在可以进行编译了,
  mvn package -Pdist,native -DskipTests -Dtar
http://static.blog.csdn.net/scripts/ZeroClipboard/ZeroClipboard.swf   ------------------------------------------------------------------------
   Reactor Summary:
  
   Apache Hadoop Main ................................ SUCCESS
   Apache Hadoop Project POM ......................... SUCCESS
   Apache Hadoop Annotations ......................... SUCCESS
   Apache Hadoop Assemblies .......................... SUCCESS
   Apache Hadoop Project Dist POM .................... SUCCESS
   Apache Hadoop Maven Plugins ....................... SUCCESS
   Apache Hadoop Auth ................................ SUCCESS
   Apache Hadoop Auth Examples ....................... SUCCESS
   Apache Hadoop Common .............................. SUCCESS
   Apache Hadoop NFS ................................. SUCCESS
   Apache Hadoop Common Project ...................... SUCCESS
   Apache Hadoop HDFS ................................ SUCCESS
   Apache Hadoop HttpFS .............................. SUCCESS
   Apache Hadoop HDFS BookKeeper Journal ............. SUCCESS
   Apache Hadoop HDFS-NFS ............................ SUCCESS
   Apache Hadoop HDFS Project ........................ SUCCESS
   hadoop-yarn ....................................... SUCCESS
   hadoop-yarn-api ................................... SUCCESS
   hadoop-yarn-common ................................ SUCCESS
   hadoop-yarn-server ................................ SUCCESS
   hadoop-yarn-server-common ......................... SUCCESS
   hadoop-yarn-server-nodemanager .................... SUCCESS
   hadoop-yarn-server-web-proxy ...................... SUCCESS
   hadoop-yarn-server-resourcemanager ................ SUCCESS
   hadoop-yarn-server-tests .......................... SUCCESS
   hadoop-yarn-client ................................ SUCCESS
   hadoop-yarn-applications .......................... SUCCESS
   hadoop-yarn-applications-distributedshell ......... SUCCESS
   hadoop-mapreduce-client ........................... SUCCESS
   hadoop-mapreduce-client-core ...................... SUCCESS
   hadoop-yarn-applications-unmanaged-am-launcher .... SUCCESS
   hadoop-yarn-site .................................. SUCCESS
   hadoop-yarn-project ............................... SUCCESS
   hadoop-mapreduce-client-common .................... SUCCESS
   hadoop-mapreduce-client-shuffle ................... SUCCESS
   hadoop-mapreduce-client-app ....................... SUCCESS
   hadoop-mapreduce-client-hs ........................ SUCCESS
   hadoop-mapreduce-client-jobclient ................. SUCCESS
   hadoop-mapreduce-client-hs-plugins ................ SUCCESS
   Apache Hadoop MapReduce Examples .................. SUCCESS
   hadoop-mapreduce .................................. SUCCESS
   Apache Hadoop MapReduce Streaming ................. SUCCESS
   Apache Hadoop Distributed Copy .................... SUCCESS
   Apache Hadoop Archives ............................ SUCCESS
   Apache Hadoop Rumen ............................... SUCCESS
   Apache Hadoop Gridmix ............................. SUCCESS
   Apache Hadoop Data Join ........................... SUCCESS
   Apache Hadoop Extras .............................. SUCCESS
   Apache Hadoop Pipes ............................... SUCCESS
   Apache Hadoop Tools Dist .......................... SUCCESS
   Apache Hadoop Tools ............................... SUCCESS
   Apache Hadoop Distribution ........................ SUCCESS
   Apache Hadoop Client .............................. SUCCESS
   Apache Hadoop Mini-Cluster ........................ SUCCESS
   ------------------------------------------------------------------------
   BUILD SUCCESS
   ------------------------------------------------------------------------
   Total time: 11:53.144s
   Finished at: Fri Nov 22 16:58:32 CST 2013
   Final Memory: 70M/239M
   ------------------------------------------------------------------------
  直到看到上面的内容那就说明编译完成了。
  编译后的路径在:hadoop-2.2.0-src/hadoop-dist/target/hadoop-2.2.0
http://static.blog.csdn.net/scripts/ZeroClipboard/ZeroClipboard.swf  # ./hadoop version
  Hadoop 2.2.0
  Subversion Unknown -r Unknown
  Compiled by root on 2013-11-22T08:47Z
  Compiled with protoc 2.5.0
  From source with checksum 79e53ce7994d1628b240f09af91e1af4
  This command was run using /data/hadoop-2.2.0-src/hadoop-dist/target/hadoop-2.2.0/share/hadoop/common/hadoop-common-2.2.0.jar
  可以看出hadoop的版本
  # file lib//native/*
  lib//native/libhadoop.a:      current ar archive
  lib//native/libhadooppipes.a:   current ar archive
  lib//native/libhadoop.so:       symbolic link to `libhadoop.so.1.0.0'
  lib//native/libhadoop.so.1.0.0: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, not stripped
  lib//native/libhadooputils.a:   current ar archive
  lib//native/libhdfs.a:          current ar archive
  lib//native/libhdfs.so:         symbolic link to `libhdfs.so.0.0.0'
  lib//native/libhdfs.so.0.0.0:   ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, not stripped
  注意红色字体部分,如果下载官网的编译好的包,这里显示的是32-bit。
  5、部署集群准备
  两台以上机器,修改hostname, ssh免登陆,关闭防火墙等
  5.1、创建新用户
  useradd hadoop
  su hadoop
  账户分配sudo的权限。
  (切换到root账户,修改/etc/sudoers文件,增加:hadoopALL=(ALL) ALL )
  5.2、修改主机名
http://static.blog.csdn.net/scripts/ZeroClipboard/ZeroClipboard.swf  vi /etc/sysconfig/network
hostname master  注销一下系统
http://static.blog.csdn.net/scripts/ZeroClipboard/ZeroClipboard.swf  #
  变成master了,修改生效
  5.3、修改hosts
http://static.blog.csdn.net/scripts/ZeroClipboard/ZeroClipboard.swf  vi /etc/hosts
  新增你的主机IP和HOSTNAME
  192.168.1.110master
  192.168.1.111slave1
  5.4、ssh免登陆
  查看ssh
http://static.blog.csdn.net/scripts/ZeroClipboard/ZeroClipboard.swf  # rpm -qa|grep ssh
  libssh2-1.4.2-1.el6.x86_64
  openssh-5.3p1-84.1.el6.x86_64
  openssh-server-5.3p1-84.1.el6.x86_64
  缺少openssh-clientshttp://static.blog.csdn.net/scripts/ZeroClipboard/ZeroClipboard.swf
  yum install openssh-clients
  配置无密登录
$ cd /home/hadoop/  $ ssh-keygen -t rsa
  一路回车
http://static.blog.csdn.net/scripts/ZeroClipboard/ZeroClipboard.swf  $ cd .ssh/

  $ cp>  $ chmod 600 authorized_keys
  把authorized_keys复制到其他要无密的机器上
http://static.blog.csdn.net/scripts/ZeroClipboard/ZeroClipboard.swf
[*]  $ scp authorized_keys root@192.168.1.111:/home/hadoop/.ssh/
  记得这里是以要以root权限过去,不然会报权限错误
http://static.blog.csdn.net/scripts/ZeroClipboard/ZeroClipboard.swf
[*]  $ ssh slave1
[*]  Last login: Mon Nov 25 14:49:25 2013 from master
[*]  $
  看到已经变成slave1了,说明成功鸟
  6、开始集群配置工作
  配置之前在要目录下创建三个目录,用来放hadooop文件和日志数据
http://static.blog.csdn.net/scripts/ZeroClipboard/ZeroClipboard.swf  $mkdir -p dfs/name
  $mkdir -p dfs/data
  $mkdir -p temp
  把之前编译成功的版本移到hadoop目录下,注意目录权限问题
  配置hadoop环境变量:

  export JAVA_HOME PATH>  export HADOOP_DEV_HOME=/home/hadoop-2.2.0
  export PATH=$PATH:$HADOOP_DEV_HOME/bin
  export PATH=$PATH:$HADOOP_DEV_HOME/sbin
  export HADOOP_MAPARED_HOME=${HADOOP_DEV_HOME}
  export HADOOP_COMMON_HOME=${HADOOP_DEV_HOME}
  export HADOOP_HDFS_HOME=${HADOOP_DEV_HOME}
  export YARN_HOME=${HADOOP_DEV_HOME}
  export HADOOP_CONF_DIR=${HADOOP_DEV_HOME}/etc/hadoop
  export HDFS_CONF_DIR=${HADOOP_DEV_HOME}/etc/hadoop
  export YARN_CONF_DIR=${HADOOP_DEV_HOME}/etc/hadoop
  export HADOOP_COMMON_LIB_NATIVE_LIB=$HADOOP_INSTALL/lib/native
  export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib"
  下面就开始配置文件
  6.1hadoop-env.sh
  找到JAVA_HOME,把路径改为实际地址
  6.2 yarn-env.sh
  找到JAVA_HOME,把路径改为实际地址
  6.3slave
  配置所有slave节点
  将datanode主机名加入其中
  6.4 core-site.xml
http://static.blog.csdn.net/scripts/ZeroClipboard/ZeroClipboard.swf  
  fs.defaultFS
  hdfs://master:9000   //系统分布式URL
  
  
  io.file.buffer.size
  131072
  
  
  hadoop.tmp.dir
  file:/home/hadoop/temp
  
  
  hadoop.proxyuser.hadoop.hosts
  *
  
  
  hadoop.proxyuser.hadoop.groups
  *
  
  注意fs.defaultFS为2.2.0新的变量,代替旧的:fs.default.name
  6.5、hdfs-site.xml
  配置namenode、datanode的本地目录信息
http://static.blog.csdn.net/scripts/ZeroClipboard/ZeroClipboard.swf  
  dfs.namenode.secondary.http-address
  master:9001
  
  
  dfs.namenode.name.dir
  /usr/app/dfs/name
  
  
  dfs.datanode.data.dir
  /usr/app/dfs/data
  
  
  dfs.replication
  1
  
  
  dfs.webhdfs.enabled
  true
  
  新的:dfs.namenode.name.dir,旧:dfs.name.dir,新:dfs.datanode.name.dir,旧:dfs.data.dir
  dfs.replication确定 data block的副本数目,hadoop基于rackawareness(机架感知)默认复制3份分block,(同一个rack下两个,另一个rack下一 份,按照最短距离确定具体所需block, 一般很少采用跨机架数据块,除非某个机架down了)
  6.6、mapred-site.xml
  配置其使用 Yarn 框架执行 map-reduce 处理程序
  这个地方需要把mapred-site.xml.template复制重新命名
http://static.blog.csdn.net/scripts/ZeroClipboard/ZeroClipboard.swf  
  mapreduce.framework.name
  yarn
  
  
  mapreduce.jobhistory.address
  master:10020
  
  
  mapreduce.jobhistory.webapp.address
  master:19888
  
  新的计算框架取消了实体上的jobtracker, 故不需要再指定mapreduce.jobtracker.addres,而是要指定一种框架,这里选择yarn. 备注2:hadoop2.2.还支持第三方的计算框架,但没怎么关注过。
  配置好以后将$HADOOP_HOME下的所有文件,包括hadoop目录分别copy到其它3个节点上。
  6.7、yarn-site.xml
  配置ResourceManager,NodeManager的通信端口,WEB监控端口等
  
  yarn.nodemanager.aux-services
  mapreduce_shuffle
  
  
  yarn.nodemanager.aux-services.mapreduce.shuffle.class
  org.apache.hadoop.mapred.ShuffleHandler
  
  
  yarn.resourcemanager.address
  master:8032
  
  
  yarn.resourcemanager.scheduler.address
  master:8030
  
  
  yarn.resourcemanager.resource-tracker.address
  master:8031
  
  
  yarn.resourcemanager.admin.address
  master:8033
  
  
  yarn.resourcemanager.webapp.address
  master:8088
  
  
  yarn.nodemanager.resource.memory-mb //配置内存
  
http://static.blog.csdn.net/scripts/ZeroClipboard/ZeroClipboard.swf  把所有配置文件复制到其他的slave节点。
  7、启动hadoop
  这里你可以进行环境变量设置,不举例了
  7.1、格式化namenode
  $ cd /home/hadoop/hadoop-2.2.0/bin/
  $ ./hdfs namenode -format
  7.2、启动hdfs
  $ cd ../sbin/
  $ ./start-dfs.sh
  这时候在master中输入jps应该看到namenode和secondarynamenode服务启动,slave中看到datanode服务启动
  7.3、启动yarn
  $ ./start-yarn.sh
  master中应该有ResourceManager服务,slave中应该有nodemanager服务
  查看集群状态:./bin/hdfs dfsadmin –report
  查看文件块组成:./bin/hdfsfsck / -files -blocks
  查看各节点状态:    http://192.168.10.10:50070
  查看resourcemanager上cluster运行状态:    http:// 192.168.10.11:8088
  8、安装中要注意的事项
  8.1、注意版本,机器是32bit还是64位
  8.2、注意依赖包的安装
  8.3、写配置文件注意”空格“,特别是从别的地方copy的时候
  8.4、关闭所有节点的防火墙
  如果有看到类似"no route to host"这样的异常,基本就是防火墙没关
  记得关的时候要切换到root帐号
  (1) 重启后永久性生效:
  开启:chkconfig iptables on
  关闭:chkconfig iptables off
  (2) 即时生效,重启后失效:
  开启:service iptables start
  关闭:service iptables stop
  8.5、开启datanode后自动关闭或者datanode无法启动请参考:
  http://xiaofengge315.blog.51cto.com/405835/1392841
  8.6 no datanode to stop
  删除/tmp目录下的
  adoop-daemon.sh代码,脚本是通过pid文件来停止hadoop服务的,而集群配置是使用的默认配置,pid文件位于/tmp目录下,对比/tmp目录下hadoop pid文件中的进程id和ps ax查出来的进程id,发现两个进程id不一致,终于找到了问题的根源。
  赶紧去更新hadoop的配置吧!
  修改hadoop-env.sh中的:HADOOP_PID_DIR = hadoop安装路径
  9、运行测试例子http://static.blog.csdn.net/scripts/ZeroClipboard/ZeroClipboard.swf
  $ ./yarn jar ../share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar randomwriter /home/hadoop/dfs/input/
  这里要注意不要用 -jar,不然会报异常“Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/util/ProgramDriver”
  $ ./yarn jar ../share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount /home/hadoop/dfs/input/ /home/hadoop/dfs/output/
  在input下面新建两个文件
  $mkdir /dfs/input %echo ‘hello,world’ >> input/file1.in
  $echo ‘hello, ruby’ >> input/file2.in
  ./bin/hadoop fs -mkdir -p /home/hadoop/dfs/input
  ./bin/hadoop fs –put /home/hadoop/dfs/input /home/hadoop/test/test_wordcount/in
  查看word count的计算结果:
  $bin/hadoop fs -cat /home/hadoop/test/test_wordcount/out/*
  hadoop 1
  hello1
  ruby


页: [1]
查看完整版本: hadoop2.2.0编译安装