hyytaojunming 发表于 2016-12-3 10:58:51

hadoop 初始配置

  配置NFS
  配置 NFS (root用户 )
  (1)在 master上检查是否已经安装了 nfs包
  # rpm -qa|grep nfs
  nfs -utils-1.0.6-46
  system-config-nfs-1.2.8-1
  # rpm -qa|grep portmap
  portmap-4.0-63
  若没有安装,可下载 rpm包,通过 rpm –ivh ***.rpm来安装
  (2)共享 master上的/home目录
  # vi /etc/exports //*表示对任何网段都可连接
  /home*(rw,no_root_squash, s ync) //async是异步的,速度比较慢
  (3) 在 master上启动 nfs s erver
  # service portmap
  用法:/etc/init.d/portmap{start|stop|status|restart|reload|condrestart}
  # service portmap start
  
  # service nfs
  用法:nfs{start|stop|status|restart|reload|condrestart}
  # service nfs start
  或者 #/etc/init.d/nfs start
  
  # service nfslock start
  (4) 在 master上设置开机自动启动 nfs
  # chkconfig--level 2345 nfs on
  //可以把 chkconfig理解为开关,不过这个开关主要是用来设置系统启动时,各服务在0-6运行级别下的开启状态的。
  (5) 在 slave端以 root用户在 hadoop02~hadoop08上设置开机自动挂载/home
  编辑 /etc/fstab
  hadoop01:/home/homenfs defaults 00
  或者在/etc/rc.d/rc.local中添加
  命令行模式:# mount -t nfshadoop01:/home /home
  -------------------------------------------------------------
  (6) 排错
  如果出现mount: mountto NFS server 'node1' failed: System Error: No route to host.则可能是防火墙惹的祸。
     通过setup把防火墙关掉,应该就可以了。
  SSH login without password
  for root user

[*]vi /etc/hosts
  #
  192.168.1.X A
  192.168.1.Y B

[*]generateauthentication keys anddistribute
  # ssh-keygen -t rsa
  # cp ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys
  # scp ~/.ssh/* root@B:~/.ssh/

[*]check
  # ssh B
  # ssh A
  for non-root users

[*]just like what has to do for root user
  Hadoop集群安装(1)
  1. 机器配置
  (1) 机器规划
  master(NameNode, JobTracker)192.168.100.123node14
  slave1(DataNode, TaskTracker)192.168.100.124node15
  slave2(DataNode, TaskTracker)192.168.100.125 node16
  (2) 添加hadoop用户
  在三台机器上分别 groupadd hadoop并 useradd -g hadoophadoop添加hadoop用户
  (3) NFS设置
  通过root用户在master上配置NFSserver,并共享/home目录;
  在slaves上挂在master上的/home到本地/home
  (4) ssh无密码(在node14上对hadoop)
viewplaincopy
1. ssh-keygen-trsa
2. cp~/.ssh/id_rsa.pub~/.ssh/authorized_keys

  (5) 目录结构
  ~/soft
  ~/program
  ~/study
  2. 安装JDK (在master上安装,hadoop用户)
  (1) 解压
  (2) 配置环境变量
viewplaincopy
1. $vi.bashrc
2. exportJAVA_HOME=/home/hadoop/program/jdk1.6.0_22
3. exportPATH=$JAVA_HOME/bin:$PATH
4. exportCLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$CLASSPATH
5. $source.bashrc
6. $whichjava
7. ~/program/jdk1.6.0_22/bin/java

  3. 安装Hadoop0.21(在master上安装,hadoop用户)
  (1) 在~/program下面解压
  $ cp soft/hadoop-0.21.0.tar.gz program/
  $ tar -zxvf hadoop-0.21.0.tar.gz
  (2)配置环境变量
  $ vi .bashrc
  export HADOOP_HOME=/home/hadoop/program/hadoop-0.21.0
  export HADOOP_LOG_DIR=$HADOOP_HOME/logs
  $ source .bashrc
  (3) 配置环境
  $ vi conf/hadoop-env.sh
  export JAVA_HOME=/home/hadoop/program/jdk1.6.0_22
  export HADOOP_LOG_DIR=${HADOOP_HOME}/logs
  (4)修改master和slave文件
  $ cat conf/masters
  node14
  $ cat conf/slaves
  node15
  node16
  配置conf/core-site.xml
  1.配置NameNode
viewplaincopy
1. $catcore-site.xml
2. <configuration>
3. <property>
4. <name>fs.default.name</name>
5. <value>hdfs://node14:9000</value>
6. </property>
7. </configuration>

  配置conf/hdfs-site.xml
  1. 配置NameNode和DataNode的目录()
  说明:配置dfs.name.dir和 dfs.data.dir
viewplaincopy
1. <property>
2. <name>dfs.namenode.name.dir</name>
3. <value>file://${hadoop.tmp.dir}/dfs/name</value>
4. <description>DetermineswhereonthelocalfilesystemtheDFSnamenode
5. shouldstorethenametable(fsimage).Ifthisisacomma-delimitedlist
6. ofdirectoriesthenthenametableisreplicatedinallofthe
7. directories,forredundancy.</description>
8. </property>
9.
10.
11. <property>
12. <name>dfs.datanode.data.dir</name>
13. <value>file://${hadoop.tmp.dir}/dfs/data</value>
14. <description>DetermineswhereonthelocalfilesystemanDFSdatanode
15. shouldstoreitsblocks.Ifthisisacomma-delimited
16. listofdirectories,thendatawillbestoredinallnamed
17. directories,typicallyondifferentdevices.
18. Directoriesthatdonotexistareignored.
19. </description>
20. </property>

  
  2. 配置副本度
viewplaincopy
1. <property>
2. <name>dfs.replication</name>
3. <value>2</value>
4. </property>

  配置JobTracker_conf/mapred-site.xml
  conf/mapred-site.xml
  总结:主要配置JobTracker的address,scheduler,queue等。
  1. 配置JobTracker (必须设置)
viewplaincopy
1. <configuration>
2. <property>
3. <name>mapreduce.jobtracker.address</name>
4. <value>node14:9001</value>
5. <description>jobtracker'saddress</description>
6. </property>
7. </configuration>

  2. 还有其他可配置项
  具体见hadoop-0.21.0/mapred/src/java/mapred-default.xml中,如
  (1)设置作业调度器
viewplaincopy
1. <property>
2. <name>mapreduce.jobtracker.taskscheduler</name>
3. <value>org.apache.hadoop.mapred.JobQueueTaskScheduler</value>
4. <description>Theclassresponsibleforschedulingthetasks.</description>
5. </property>

  (2) 作业队列
viewplaincopy
1. <property>
2. <name>mapreduce.job.queuename</name>
3. <value>Queue-A:Queue-B:Queue-C</value>
4. </property>

  mapreduce.jobtracker.system.dir
  mapreduce.cluster.local.dir
页: [1]
查看完整版本: hadoop 初始配置