hadoop2.2.0编译安装
系统版本:CENTOS6.5 2.6.32-431.el6.X86_641、安装JDK
我这里用的是64位机,要下载对应的64位的JDK,下载地址:http://www.oracle.com/technetwork/cn/java/javase/downloads/jdk7-downloads-1880260-zhs.html,选择对应的JDK版本,解压JDK,然后配置环境变量
http://static.blog.csdn.net/scripts/ZeroClipboard/ZeroClipboard.swf vi /etc/profile
export PATH
export JAVA_HOME=/opt/jdk1.7
export PATH=$PATH:$JAVA_HOME/bin
source /etc/profile
测试下JDK是否安装成功: java -version
java version "1.7.0_45" Java(TM) SE Runtime Environment (build 1.7.0_45-b18)
Java HotSpot(TM) 64-Bit Server VM (build 24.45-b08, mixed mode)
2、编译前的准备(maven)
maven官方下载地址,可以选择源码编码安装,这里就直接下载编译好的 就可以了http://static.blog.csdn.net/scripts/ZeroClipboard/ZeroClipboard.swf
wget http://mirror.bit.edu.cn/apache/maven/maven-3/3.1.1/binaries/apache-maven-3.1.1-bin.zip
解压文件后,同样在/etc/profie里配置环境变量
export PATH=/usr/local/maven/bin:$PATH 验证配置是否成功: mvn --version
Apache Maven 3.1.1 (0728685237757ffbf44136acec0402957f723d9a; 2013-09-17 23:22:22+0800)
Maven home: /opt/maven3.1.1
Java version: 1.7.0_45, vendor: Oracle Corporation
Java home: /opt/jdk1.7/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "2.6.32-358.el6.x86_64", arch: "amd64", family: "unix"
3、编译hadoop(PS:期间碰到很多次read timeout,导致失败,每次都要clean后重编译,急死人了。。)
wget http://mirrors.cnnic.cn/apache/hadoop/common/hadoop-2.2.0/hadoop-2.2.0-src.tar.gz
如果是你32bit的机器,可以直接下载官方已经编译好的包,64bit的机子跑编译好的包跑不了。
由于maven国外服务器可能连不上,先给maven配置一下国内镜像,在maven目录下:conf/settings.xml,在里添加,原本的不要动
http://static.blog.csdn.net/scripts/ZeroClipboard/ZeroClipboard.swf
nexus-osc
*
Nexusosc
http://maven.oschina.net/content/groups/public/
同样,在内新添加
http://static.blog.csdn.net/scripts/ZeroClipboard/ZeroClipboard.swf
jdk-1.7
1.7
nexus
local private nexus
http://maven.oschina.net/content/groups/public/
true
false
nexus
local private nexus
http://maven.oschina.net/content/groups/public/
true
false
编译clean
cd hadoop2.2.0-src mvn clean install –DskipTests
发现异常
Failed to execute goal org.apache.hadoop:hadoop-maven-plugins:2.2.0:protoc (compile-protoc) on project hadoop-common: org.apache.maven.plugin.MojoExecutionException: 'protoc --version' did not return a version ->
To see the full stack trace of the errors, re-run Maven with the -e switch.
Re-run Maven using the -X switch to enable full debug logging.
For more information about the errors and possible solutions, please read the following articles:
http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
After correcting the problems, you can resume the build with the command
mvn-rf :hadoop-common
hadoop2.2.0编译需要protoc2.5.0的支持,所以还要下载protoc,下载地址:https://code.google.com/p/protobuf/downloads/list,要下载2.5.0版本噢
对protoc进行编译安装前先要装几个依赖包:gcc,gcc-c++,make 如果已经安装的可以忽略
http://static.blog.csdn.net/scripts/ZeroClipboard/ZeroClipboard.swf yum install gcc
yum intall gcc-c++
yum install make
安装protoc
tar -xvf protobuf-2.5.0.tar.bz2 cd protobuf-2.5.0
./configure --prefix=/opt/protoc/
make && make install
安装完配置下环境变量,就不多说了,跟上面过程一样。
别急,还不要着急开始编译安装,不然又是各种错误,需要安装cmake,openssl-devel,ncurses-devel依赖
http://static.blog.csdn.net/scripts/ZeroClipboard/ZeroClipboard.swf yum install cmake
yum install openssl-devel
yum install ncurses-devel
如报以下错误请:
Patch :https://issues.apache.org/jira/browse/HADOOP-10110
Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.5.1:testCompile (default-testCompile) on project hadoop-auth: Compilation failure: Compilation failure:
/home/chuan/trunk/hadoop-common-project/hadoop-auth/src/test/java/org/apache/hadoop/security/authentication/client/AuthenticatorTestCase.java: cannot access org.mortbay.component.AbstractLifeCycle
> ok,现在可以进行编译了,
mvn package -Pdist,native -DskipTests -Dtar
http://static.blog.csdn.net/scripts/ZeroClipboard/ZeroClipboard.swf ------------------------------------------------------------------------
Reactor Summary:
Apache Hadoop Main ................................ SUCCESS
Apache Hadoop Project POM ......................... SUCCESS
Apache Hadoop Annotations ......................... SUCCESS
Apache Hadoop Assemblies .......................... SUCCESS
Apache Hadoop Project Dist POM .................... SUCCESS
Apache Hadoop Maven Plugins ....................... SUCCESS
Apache Hadoop Auth ................................ SUCCESS
Apache Hadoop Auth Examples ....................... SUCCESS
Apache Hadoop Common .............................. SUCCESS
Apache Hadoop NFS ................................. SUCCESS
Apache Hadoop Common Project ...................... SUCCESS
Apache Hadoop HDFS ................................ SUCCESS
Apache Hadoop HttpFS .............................. SUCCESS
Apache Hadoop HDFS BookKeeper Journal ............. SUCCESS
Apache Hadoop HDFS-NFS ............................ SUCCESS
Apache Hadoop HDFS Project ........................ SUCCESS
hadoop-yarn ....................................... SUCCESS
hadoop-yarn-api ................................... SUCCESS
hadoop-yarn-common ................................ SUCCESS
hadoop-yarn-server ................................ SUCCESS
hadoop-yarn-server-common ......................... SUCCESS
hadoop-yarn-server-nodemanager .................... SUCCESS
hadoop-yarn-server-web-proxy ...................... SUCCESS
hadoop-yarn-server-resourcemanager ................ SUCCESS
hadoop-yarn-server-tests .......................... SUCCESS
hadoop-yarn-client ................................ SUCCESS
hadoop-yarn-applications .......................... SUCCESS
hadoop-yarn-applications-distributedshell ......... SUCCESS
hadoop-mapreduce-client ........................... SUCCESS
hadoop-mapreduce-client-core ...................... SUCCESS
hadoop-yarn-applications-unmanaged-am-launcher .... SUCCESS
hadoop-yarn-site .................................. SUCCESS
hadoop-yarn-project ............................... SUCCESS
hadoop-mapreduce-client-common .................... SUCCESS
hadoop-mapreduce-client-shuffle ................... SUCCESS
hadoop-mapreduce-client-app ....................... SUCCESS
hadoop-mapreduce-client-hs ........................ SUCCESS
hadoop-mapreduce-client-jobclient ................. SUCCESS
hadoop-mapreduce-client-hs-plugins ................ SUCCESS
Apache Hadoop MapReduce Examples .................. SUCCESS
hadoop-mapreduce .................................. SUCCESS
Apache Hadoop MapReduce Streaming ................. SUCCESS
Apache Hadoop Distributed Copy .................... SUCCESS
Apache Hadoop Archives ............................ SUCCESS
Apache Hadoop Rumen ............................... SUCCESS
Apache Hadoop Gridmix ............................. SUCCESS
Apache Hadoop Data Join ........................... SUCCESS
Apache Hadoop Extras .............................. SUCCESS
Apache Hadoop Pipes ............................... SUCCESS
Apache Hadoop Tools Dist .......................... SUCCESS
Apache Hadoop Tools ............................... SUCCESS
Apache Hadoop Distribution ........................ SUCCESS
Apache Hadoop Client .............................. SUCCESS
Apache Hadoop Mini-Cluster ........................ SUCCESS
------------------------------------------------------------------------
BUILD SUCCESS
------------------------------------------------------------------------
Total time: 11:53.144s
Finished at: Fri Nov 22 16:58:32 CST 2013
Final Memory: 70M/239M
------------------------------------------------------------------------
直到看到上面的内容那就说明编译完成了。
编译后的路径在:hadoop-2.2.0-src/hadoop-dist/target/hadoop-2.2.0
http://static.blog.csdn.net/scripts/ZeroClipboard/ZeroClipboard.swf # ./hadoop version
Hadoop 2.2.0
Subversion Unknown -r Unknown
Compiled by root on 2013-11-22T08:47Z
Compiled with protoc 2.5.0
From source with checksum 79e53ce7994d1628b240f09af91e1af4
This command was run using /data/hadoop-2.2.0-src/hadoop-dist/target/hadoop-2.2.0/share/hadoop/common/hadoop-common-2.2.0.jar
可以看出hadoop的版本
# file lib//native/*
lib//native/libhadoop.a: current ar archive
lib//native/libhadooppipes.a: current ar archive
lib//native/libhadoop.so: symbolic link to `libhadoop.so.1.0.0'
lib//native/libhadoop.so.1.0.0: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, not stripped
lib//native/libhadooputils.a: current ar archive
lib//native/libhdfs.a: current ar archive
lib//native/libhdfs.so: symbolic link to `libhdfs.so.0.0.0'
lib//native/libhdfs.so.0.0.0: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, not stripped
注意红色字体部分,如果下载官网的编译好的包,这里显示的是32-bit。
5、部署集群准备
两台以上机器,修改hostname, ssh免登陆,关闭防火墙等
5.1、创建新用户
useradd hadoop
su hadoop
账户分配sudo的权限。
(切换到root账户,修改/etc/sudoers文件,增加:hadoopALL=(ALL) ALL )
5.2、修改主机名
http://static.blog.csdn.net/scripts/ZeroClipboard/ZeroClipboard.swf vi /etc/sysconfig/network
hostname master 注销一下系统
http://static.blog.csdn.net/scripts/ZeroClipboard/ZeroClipboard.swf #
变成master了,修改生效
5.3、修改hosts
http://static.blog.csdn.net/scripts/ZeroClipboard/ZeroClipboard.swf vi /etc/hosts
新增你的主机IP和HOSTNAME
192.168.1.110master
192.168.1.111slave1
5.4、ssh免登陆
查看ssh
http://static.blog.csdn.net/scripts/ZeroClipboard/ZeroClipboard.swf # rpm -qa|grep ssh
libssh2-1.4.2-1.el6.x86_64
openssh-5.3p1-84.1.el6.x86_64
openssh-server-5.3p1-84.1.el6.x86_64
缺少openssh-clientshttp://static.blog.csdn.net/scripts/ZeroClipboard/ZeroClipboard.swf
yum install openssh-clients
配置无密登录
$ cd /home/hadoop/ $ ssh-keygen -t rsa
一路回车
http://static.blog.csdn.net/scripts/ZeroClipboard/ZeroClipboard.swf $ cd .ssh/
$ cp> $ chmod 600 authorized_keys
把authorized_keys复制到其他要无密的机器上
http://static.blog.csdn.net/scripts/ZeroClipboard/ZeroClipboard.swf
[*] $ scp authorized_keys root@192.168.1.111:/home/hadoop/.ssh/
记得这里是以要以root权限过去,不然会报权限错误
http://static.blog.csdn.net/scripts/ZeroClipboard/ZeroClipboard.swf
[*] $ ssh slave1
[*] Last login: Mon Nov 25 14:49:25 2013 from master
[*] $
看到已经变成slave1了,说明成功鸟
6、开始集群配置工作
配置之前在要目录下创建三个目录,用来放hadooop文件和日志数据
http://static.blog.csdn.net/scripts/ZeroClipboard/ZeroClipboard.swf $mkdir -p dfs/name
$mkdir -p dfs/data
$mkdir -p temp
把之前编译成功的版本移到hadoop目录下,注意目录权限问题
配置hadoop环境变量:
export JAVA_HOME PATH> export HADOOP_DEV_HOME=/home/hadoop-2.2.0
export PATH=$PATH:$HADOOP_DEV_HOME/bin
export PATH=$PATH:$HADOOP_DEV_HOME/sbin
export HADOOP_MAPARED_HOME=${HADOOP_DEV_HOME}
export HADOOP_COMMON_HOME=${HADOOP_DEV_HOME}
export HADOOP_HDFS_HOME=${HADOOP_DEV_HOME}
export YARN_HOME=${HADOOP_DEV_HOME}
export HADOOP_CONF_DIR=${HADOOP_DEV_HOME}/etc/hadoop
export HDFS_CONF_DIR=${HADOOP_DEV_HOME}/etc/hadoop
export YARN_CONF_DIR=${HADOOP_DEV_HOME}/etc/hadoop
export HADOOP_COMMON_LIB_NATIVE_LIB=$HADOOP_INSTALL/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib"
下面就开始配置文件
6.1hadoop-env.sh
找到JAVA_HOME,把路径改为实际地址
6.2 yarn-env.sh
找到JAVA_HOME,把路径改为实际地址
6.3slave
配置所有slave节点
将datanode主机名加入其中
6.4 core-site.xml
http://static.blog.csdn.net/scripts/ZeroClipboard/ZeroClipboard.swf
fs.defaultFS
hdfs://master:9000 //系统分布式URL
io.file.buffer.size
131072
hadoop.tmp.dir
file:/home/hadoop/temp
hadoop.proxyuser.hadoop.hosts
*
hadoop.proxyuser.hadoop.groups
*
注意fs.defaultFS为2.2.0新的变量,代替旧的:fs.default.name
6.5、hdfs-site.xml
配置namenode、datanode的本地目录信息
http://static.blog.csdn.net/scripts/ZeroClipboard/ZeroClipboard.swf
dfs.namenode.secondary.http-address
master:9001
dfs.namenode.name.dir
/usr/app/dfs/name
dfs.datanode.data.dir
/usr/app/dfs/data
dfs.replication
1
dfs.webhdfs.enabled
true
新的:dfs.namenode.name.dir,旧:dfs.name.dir,新:dfs.datanode.name.dir,旧:dfs.data.dir
dfs.replication确定 data block的副本数目,hadoop基于rackawareness(机架感知)默认复制3份分block,(同一个rack下两个,另一个rack下一 份,按照最短距离确定具体所需block, 一般很少采用跨机架数据块,除非某个机架down了)
6.6、mapred-site.xml
配置其使用 Yarn 框架执行 map-reduce 处理程序
这个地方需要把mapred-site.xml.template复制重新命名
http://static.blog.csdn.net/scripts/ZeroClipboard/ZeroClipboard.swf
mapreduce.framework.name
yarn
mapreduce.jobhistory.address
master:10020
mapreduce.jobhistory.webapp.address
master:19888
新的计算框架取消了实体上的jobtracker, 故不需要再指定mapreduce.jobtracker.addres,而是要指定一种框架,这里选择yarn. 备注2:hadoop2.2.还支持第三方的计算框架,但没怎么关注过。
配置好以后将$HADOOP_HOME下的所有文件,包括hadoop目录分别copy到其它3个节点上。
6.7、yarn-site.xml
配置ResourceManager,NodeManager的通信端口,WEB监控端口等
yarn.nodemanager.aux-services
mapreduce_shuffle
yarn.nodemanager.aux-services.mapreduce.shuffle.class
org.apache.hadoop.mapred.ShuffleHandler
yarn.resourcemanager.address
master:8032
yarn.resourcemanager.scheduler.address
master:8030
yarn.resourcemanager.resource-tracker.address
master:8031
yarn.resourcemanager.admin.address
master:8033
yarn.resourcemanager.webapp.address
master:8088
yarn.nodemanager.resource.memory-mb //配置内存
http://static.blog.csdn.net/scripts/ZeroClipboard/ZeroClipboard.swf 把所有配置文件复制到其他的slave节点。
7、启动hadoop
这里你可以进行环境变量设置,不举例了
7.1、格式化namenode
$ cd /home/hadoop/hadoop-2.2.0/bin/
$ ./hdfs namenode -format
7.2、启动hdfs
$ cd ../sbin/
$ ./start-dfs.sh
这时候在master中输入jps应该看到namenode和secondarynamenode服务启动,slave中看到datanode服务启动
7.3、启动yarn
$ ./start-yarn.sh
master中应该有ResourceManager服务,slave中应该有nodemanager服务
查看集群状态:./bin/hdfs dfsadmin –report
查看文件块组成:./bin/hdfsfsck / -files -blocks
查看各节点状态: http://192.168.10.10:50070
查看resourcemanager上cluster运行状态: http:// 192.168.10.11:8088
8、安装中要注意的事项
8.1、注意版本,机器是32bit还是64位
8.2、注意依赖包的安装
8.3、写配置文件注意”空格“,特别是从别的地方copy的时候
8.4、关闭所有节点的防火墙
如果有看到类似"no route to host"这样的异常,基本就是防火墙没关
记得关的时候要切换到root帐号
(1) 重启后永久性生效:
开启:chkconfig iptables on
关闭:chkconfig iptables off
(2) 即时生效,重启后失效:
开启:service iptables start
关闭:service iptables stop
8.5、开启datanode后自动关闭或者datanode无法启动请参考:
http://xiaofengge315.blog.51cto.com/405835/1392841
8.6 no datanode to stop
删除/tmp目录下的
adoop-daemon.sh代码,脚本是通过pid文件来停止hadoop服务的,而集群配置是使用的默认配置,pid文件位于/tmp目录下,对比/tmp目录下hadoop pid文件中的进程id和ps ax查出来的进程id,发现两个进程id不一致,终于找到了问题的根源。
赶紧去更新hadoop的配置吧!
修改hadoop-env.sh中的:HADOOP_PID_DIR = hadoop安装路径
9、运行测试例子http://static.blog.csdn.net/scripts/ZeroClipboard/ZeroClipboard.swf
$ ./yarn jar ../share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar randomwriter /home/hadoop/dfs/input/
这里要注意不要用 -jar,不然会报异常“Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/util/ProgramDriver”
$ ./yarn jar ../share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount /home/hadoop/dfs/input/ /home/hadoop/dfs/output/
在input下面新建两个文件
$mkdir /dfs/input %echo ‘hello,world’ >> input/file1.in
$echo ‘hello, ruby’ >> input/file2.in
./bin/hadoop fs -mkdir -p /home/hadoop/dfs/input
./bin/hadoop fs –put /home/hadoop/dfs/input /home/hadoop/test/test_wordcount/in
查看word count的计算结果:
$bin/hadoop fs -cat /home/hadoop/test/test_wordcount/out/*
hadoop 1
hello1
ruby
页:
[1]