设为首页 收藏本站
查看: 565|回复: 0

[经验分享] Hadoop学习日志(一)

[复制链接]

尚未签到

发表于 2016-12-5 09:59:04 | 显示全部楼层 |阅读模式
Hadoop安装
0.安装java环境
<!--[if !supportLists]-->1.       <!--[endif]-->下载版本1.2.1版本Hadoop
<!--[if !supportLists]-->2.       <!--[endif]-->tar zxvf  解压缩hadoop到相应目录
<!--[if !supportLists]-->3.       <!--[endif]-->需要配置的4个文件:hadoop-env.shcore-site.xmlhdfs-site.xmlmapred-site.xmlmastersslaves
4、文件说明:
hadoop-env.shhadoop启动环境设置
hdfs-site.xml  是分布式文件系统相关配置
mapred-site.xmlmap-reduce任务相关配置
master 表示主从机器(主机器是nameNode,丛就是dataNode
slaver 表示丛机器
5hadoop-env.sh配置详细:
 
# Set Hadoop-specific environment variables here.
 
# The only required environment variable is JAVA_HOME.  All others are
# optional.  When running a distributed configuration it is best to
# set JAVA_HOME in this file, so that it is correctly defined on
# remote nodes.
 
# The java implementation to use.  Required. 配置该项目,设置jDK所在目录
export JAVA_HOME=/usr/local/jdk/jdk1.6.0_45
 
# Extra Java CLASSPATH elements.  Optional.可选的配置CLASSPATH
# export HADOOP_CLASSPATH=
 
# The maximum amount of heap to use, in MB. Default is 1000. 配置虚拟机内存
# export HADOOP_HEAPSIZE=2000
 
# Extra Java runtime options.  Empty by default. 配置模式,肯定是server模式了,client说明内存太小了,是java虚拟机模式,设计的内存回收等相关算法
# export HADOOP_OPTS=-server
 
# Command specific options appended to HADOOP_OPTS when specified 配置JMX管理配置
export HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_NAMENODE_OPTS"
export HADOOP_SECONDARYNAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_SECONDARYNAMENODE_OPTS"
export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_DATANODE_OPTS"
export HADOOP_BALANCER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_BALANCER_OPTS"
export HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_JOBTRACKER_OPTS"
# export HADOOP_TASKTRACKER_OPTS=
# The following applies to multiple commands (fs, dfs, fsck, distcp etc)
# export HADOOP_CLIENT_OPTS
 
# Extra ssh options.  Empty by default.
# export HADOOP_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HADOOP_CONF_DIR"
 
# Where log files are stored.  $HADOOP_HOME/logs by default.
# export HADOOP_LOG_DIR=${HADOOP_HOME}/logs
 
# File naming remote slave hosts.  $HADOOP_HOME/conf/slaves by default.
# export HADOOP_SLAVES=${HADOOP_HOME}/conf/slaves
 
# host:path where hadoop code should be rsync'd from.  Unset by default.
# export HADOOP_MASTER=master:/home/$USER/src/hadoop
 
# Seconds to sleep between slave commands.  Unset by default.  This
# can be useful in large clusters, where, e.g., slave rsyncs can
# otherwise arrive faster than the master can service them.
# export HADOOP_SLAVE_SLEEP=0.1
 
# The directory where pid files are stored. /tmp by default.
# NOTE: this should be set to a directory that can only be written to by
#       the users that are going to run the hadoop daemons.  Otherwise there is
#       the potential for a symlink attack.
# export HADOOP_PID_DIR=/var/hadoop/pids
 
# A string representing this instance of hadoop. $USER by default.
# export HADOOP_IDENT_STRING=$USER
 
# The scheduling priority for daemon processes.  See 'man nice'.
# export HADOOP_NICENESS=10
 
 
由于以上配置设计到 ${HADOOP_HOME} 所以需要在/etc/profile  配置
export HADOOP_HOME=/usr/local/hadoop/hadoop 就是Hadoop所在的目录
 
 
5core-site.xml核心配置:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
 
<!-- Put site-specific property overrides in this file. -->
<!—配置端口号 hadoop1 是本机机器名称需要在hosts文件夹下映射配置,也可以配置IP地址   - ->
<configuration>
  <property>
   <name>fs.default.name</name>
    <value>hdfs://hadoop1:9000</value>
  </property>
<!--  配置文件系统所在目录 à
  <property> 
  <name>hadoop.tmp.dir</name> 
  <value>/usr/local/hadoop/hadoop/tmp</value>
  </property> 
</configuration>
 
6、配置hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
 
<!-- Put site-specific property overrides in this file. -->
 
<configuration>
<!--  是否打开权限检查 à
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<!--  副本复制的数目 à
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
 
7、配置mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
 
<!-- Put site-specific property overrides in this file. -->
 
<configuration>
<!--  配置jobtracker端口号,dataNode会使用 à
<property>
<name>mapred.job.tracker</name>
<value>hdfs://hadoop1:9001</value>
</property>
</configuration>
8master配置:
配置那些是master可以配置机器名称也可以配置IP地址
例如:
192.168.100.101
192.168.100.102
9salver配置:
配置那些事slaver
192.168.100.101
192.168.100.102
10、配置SSH无密登录
免密码ssh设置(合适的权限很重要)
登入hadoop账户,建立ssh文件夹    mkdir .ssh
现在确认能否不输入口令就用ssh登录本机:
$ ssh namenode
如果不输入口令就无法用ssh登陆namenode,执行下面的命令:
$ ssh-keygen -t rsa –f ~/.ssh/id_rsa 
回车设置密钥,可以设置一个密钥,也可以设置为空密钥。但是安全起见,设置密钥为hadoop,下面利用ssh-agent来设置免密钥登陆集群中其他机器。
私钥放在由-f选项指定的文件之中,例如~/.ssh/id_rsa。存放公钥的文件名称与私钥类似,但是以’.pub’为后缀,本例为~/.ssh/id_rsa.pub
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
ssh密钥存放在namenode机器的~/.ssh/authorized_keys文件中。
再用scp命令将authorized_keysid_rsa.pub分发到其它机器的相同文件夹下,例如,将authorized_keys文件分发给datanode1机器的.ssh文件夹的命令为:
$scp ~/.ssh/id_dsa.pub  hadoop@datanode1:/home/hadoop/.ssh
datanode1机器上  cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
理论上要将公钥发放给各机器然后再各个机器上生成authorized_keys文件,实际上直接分发authorized_keys文件也可以
然后赋予文件权限,赋予各机器的.ssh文件夹权限为711authorized_keys的权限为644。过高的权限不能通过ssh-add命令,过低的权限无法实现免密码登录。
 
11、启动start-all.sh 就可以启动hadoop 集群了。其他丛机器会自动被ssh无密码登录启动起来。nameNode会启动其他子节点的DataNode
12、使用jps查看
主机器:
[iyunv@hadoop1 conf]# jps
32365 JobTracker
32090 NameNode
25900 Jps
3017 Bootstrap
32269 SecondaryNameNode
丛机器:
[iyunv@hadoop2 ~]# jps
10009 TaskTracker
9901 DataNode
27852 Jps
恭喜你启动成功了
 
 
 
Hbase安装
由于hbase hadoop需要搭配版本的,我用的是1.2.1那么hbase用的是hbase-0.94.23
<!--[if !supportLists]-->1.       <!--[endif]-->tar  zxvf 解压缩 hbase-0.94.23.tar.gz
<!--[if !supportLists]-->2.       <!--[endif]-->修改配置文件hbase-env.shhbase-site.xml
<!--[if !supportLists]-->3.       <!--[endif]-->Hbase-env.sh
#
#/**
# * Copyright 2007 The Apache Software Foundation
# *
# * Licensed to the Apache Software Foundation (ASF) under one
# * or more contributor license agreements.  See the NOTICE file
# * distributed with this work for additional information
# * regarding copyright ownership.  The ASF licenses this file
# * to you under the Apache License, Version 2.0 (the
# * "License"); you may not use this file except in compliance
# * with the License.  You may obtain a copy of the License at
# *
# *     http://www.apache.org/licenses/LICENSE-2.0
# *
# * Unless required by applicable law or agreed to in writing, software
# * distributed under the License is distributed on an "AS IS" BASIS,
# * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# * See the License for the specific language governing permissions and
# * limitations under the License.
# */
 
# Set environment variables here.
 
# This script sets variables multiple times over the course of starting an hbase process,
# so try to keep things idempotent unless you want to take an even deeper look
# into the startup scripts (bin/hbase, etc.)
 
# The java implementation to use.  Java 1.6 required.指定JDK所在的目录
# export JAVA_HOME=/usr/local/jdk/jdk1.6.0_45/
 
# Extra Java CLASSPATH elements.  Optional.
# export HBASE_CLASSPATH=
 
# The maximum amount of heap to use, in MB. Default is 1000.
# export HBASE_HEAPSIZE=1000
 
# Extra Java runtime options.
# Below are what we set by default.  May only work with SUN JVM.
# For more on why as well as other possible settings,
# see http://wiki.apache.org/hadoop/PerformanceTuning
export HBASE_OPTS="-XX:+UseConcMarkSweepGC"
 
# Uncomment one of the below three options to enable java garbage collection logging for the server-side processes.
 
# This enables basic gc logging to the .out file.
# export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps"
 
# This enables basic gc logging to its own file.
# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .
# export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH>"
 
# This enables basic GC logging to its own file with automatic log rolling. Only applies to jdk 1.6.0_34+ and 1.7.0_2+.
# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .
# export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=1 -XX:GCLogFileSize=512M"
 
# Uncomment one of the below three options to enable java garbage collection logging for the client processes.
 
# This enables basic gc logging to the .out file.
# export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps"
 
# This enables basic gc logging to its own file.
# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .
# export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH>"
 
# This enables basic GC logging to its own file with automatic log rolling. Only applies to jdk 1.6.0_34+ and 1.7.0_2+.
# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .
# export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=1 -XX:GCLogFileSize=512M"
 
# Uncomment below if you intend to use the EXPERIMENTAL off heap cache.
# export HBASE_OPTS="$HBASE_OPTS -XX:MaxDirectMemorySize="
# Set hbase.offheapcache.percentage in hbase-site.xml to a nonzero value.
 
 
# Uncomment and adjust to enable JMX exporting
# See jmxremote.password and jmxremote.access in $JRE_HOME/lib/management to configure remote password access.
# More details at: http://java.sun.com/javase/6/docs/technotes/guides/management/agent.html
#
# export HBASE_JMX_BASE="-Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false"
# export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10101"
# export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10102"
# export HBASE_THRIFT_OPTS="$HBASE_THRIFT_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10103"
# export HBASE_ZOOKEEPER_OPTS="$HBASE_ZOOKEEPER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10104"
 
# File naming hosts on which HRegionServers will run.  $HBASE_HOME/conf/regionservers by default.
# export HBASE_REGIONSERVERS=${HBASE_HOME}/conf/regionservers
 
# File naming hosts on which backup HMaster will run.  $HBASE_HOME/conf/backup-masters by default.
# export HBASE_BACKUP_MASTERS=${HBASE_HOME}/conf/backup-masters
 
# Extra ssh options.  Empty by default.
# export HBASE_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HBASE_CONF_DIR"
 
# Where log files are stored.  $HBASE_HOME/logs by default.
# export HBASE_LOG_DIR=${HBASE_HOME}/logs
 
# Enable remote JDWP debugging of major HBase processes. Meant for Core Developers
# export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8070"
# export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8071"
# export HBASE_THRIFT_OPTS="$HBASE_THRIFT_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8072"
# export HBASE_ZOOKEEPER_OPTS="$HBASE_ZOOKEEPER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8073"
 
# A string representing this instance of hbase. $USER by default.
# export HBASE_IDENT_STRING=$USER
 
# The scheduling priority for daemon processes.  See 'man nice'.
# export HBASE_NICENESS=10
 
# The directory where pid files are stored. /tmp by default.
# export HBASE_PID_DIR=/var/hadoop/pids
 
# Seconds to sleep between slave commands.  Unset by default.  This
# can be useful in large clusters, where, e.g., slave rsyncs can
# otherwise arrive faster than the master can service them.
# export HBASE_SLAVE_SLEEP=0.1
 
# Tell HBase whether it should manage it's own instance of Zookeeper or not.
# export HBASE_MANAGES_ZK=true
 
4、配置hbase-site.xml
<configuration>
 
<property>
 <name>hbase.rootdir</name>
 <value>hdfs://hadoop1:9000/hbase</value>
 </property>
<!-- 设置hbase数据存在位置  -->
<property>
<name>hbase.rootdir</name>
<value>file:///usr/local/hadoop/hbase/hbase-0.94.23/data</value>
</property>
 
<!-- 配置Hbase是分布式模式  -->
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
 
 
<property>
<name>hbase.master</name>
<value>hdfs://hadoop1:60000</value>
</property>
 
<property>
<name>hbase.zookeeper.quorum</name>
<value>hadoop1,hadoop2</value>
</property>
 
</configuration>
~

运维网声明 1、欢迎大家加入本站运维交流群:群②:261659950 群⑤:202807635 群⑦870801961 群⑧679858003
2、本站所有主题由该帖子作者发表,该帖子作者与运维网享有帖子相关版权
3、所有作品的著作权均归原作者享有,请您和我们一样尊重他人的著作权等合法权益。如果您对作品感到满意,请购买正版
4、禁止制作、复制、发布和传播具有反动、淫秽、色情、暴力、凶杀等内容的信息,一经发现立即删除。若您因此触犯法律,一切后果自负,我们对此不承担任何责任
5、所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其内容的准确性、可靠性、正当性、安全性、合法性等负责,亦不承担任何法律责任
6、所有作品仅供您个人学习、研究或欣赏,不得用于商业或者其他用途,否则,一切后果均由您自己承担,我们对此不承担任何法律责任
7、如涉及侵犯版权等问题,请您及时通知我们,我们将立即采取措施予以解决
8、联系人Email:admin@iyunv.com 网址:www.yunweiku.com

所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其承担任何法律责任,如涉及侵犯版权等问题,请您及时通知我们,我们将立即处理,联系人Email:kefu@iyunv.com,QQ:1061981298 本贴地址:https://www.iyunv.com/thread-309901-1-1.html 上篇帖子: hadoop 1.x 环境搭建(转) 下篇帖子: Hadoop入门(一)——背景介绍
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

扫码加入运维网微信交流群X

扫码加入运维网微信交流群

扫描二维码加入运维网微信交流群,最新一手资源尽在官方微信交流群!快快加入我们吧...

扫描微信二维码查看详情

客服E-mail:kefu@iyunv.com 客服QQ:1061981298


QQ群⑦:运维网交流群⑦ QQ群⑧:运维网交流群⑧ k8s群:运维网kubernetes交流群


提醒:禁止发布任何违反国家法律、法规的言论与图片等内容;本站内容均来自个人观点与网络等信息,非本站认同之观点.


本站大部分资源是网友从网上搜集分享而来,其版权均归原作者及其网站所有,我们尊重他人的合法权益,如有内容侵犯您的合法权益,请及时与我们联系进行核实删除!



合作伙伴: 青云cloud

快速回复 返回顶部 返回列表