vvvvvvvvv config vvvvvvvv
set domain alias in all nodes(optionalmust):
/etc/hosts
#let the master accesses all the slaves without passwords:
#method 1:
ssh-copy-id -i $HOME/.ssh/id_rsa.pub hadoop@slave
#method 2:
copy $HOME/.ssh/id_rsa.pub to the slaves,
then login to slaves:cat id_rsa.pub >> $HOME/.ssh/authorized_keys
this will trigger save slave's host key fingerprint to the hadoop@master's "known_hosts" file
#setting the master & slaves file,*do only in master machine*
#Note that the machine on which bin/start-dfs.sh runned will become the primary namenode.
#so ,u MUST start hadoop in namenode ONLY! **but if u can run on all nodes if u want to run a app**
master:
master
slaves:
slave1
slave2
...
## to all nodes(contains master)
#setting the core-site.xml
<property>
<name>fs.default.name</name>
<value>hdfs://master:54310</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
#hdfs-site.xml,optional
<property>
<name>dfs.replication</name>
<value>2</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
==It defines how many machines a single file should be replicated to before it becomes available.
If you set this to a value higher than the number of slave nodes (more precisely, the number of datanodes)
that you have available, you will start seeing a lot of (Zero targets found, forbidden1.size=1) type errors in the log files.
==
</description>
</property>
#mapred-site.xml
<property>
<name>mapred.job.tracker</name>
<value>master:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
setting the /etc/hosts
because the default c1.domain set to 127.0.1.1,so there are something errors during the start-dfs.sh.
resolution:add it behind the domain "desktop" as a new domain but the same ip.others is the same.
format on master
done
FAQ:java.io.IOException: Incompatible namespaceIDs
resolution:
a.(直接删除datanode数据,再format namenode.deprecated)
1. stop the cluster
2. delete the data directory on the problematic datanode: the directory is specified by dfs.data.dir in conf/hdfs-site.xml; if you followed this tutorial, the relevant directory is /usr/local/hadoop-datastore/hadoop-hadoop/dfs/data
3. reformat the namenode (NOTE: all HDFS data is lost during this process!)
4. restart the cluster
b.(将namenode的namespaceID 更新到 datanode中的VERSION中)
1. stop the datanode
2. edit the value of namespaceID in <dfs.data.dir>/current/VERSION to match the value of the current namenode
3. restart the datanode