hadoop 2.x-HDFS federation

zp7412 发表于 2016-12-6 11:11:42

note,this article is focus on hadoop-2.5.1,it maybe a little different from hadoop-0.23.x.
　　Agenda:
　　I:hdfs federation abstract
　　II:implement
　　III:other distributed namespaces scalability
　　-------------------
　　I:hdfs federation abstract
　　in hadoop-1.x or before(except 0.23.x),there is a couple of limitations on a namenode:
　　1.single point of failure(spof,addressed by HA)
　　2.too many files/blocks,causes the metadata size is out of namenode's memory usage
　　3.single master(namenode),undertakes many tasks:meta management,resouce assigment,datanodes' heartbeats,file lcoations etc.so these will reduce the throughput of namenode
　　4. only one namespace,no valid seperation/isolation between multi tenant,e.g.a development apps will interfere with the product apps each other.
　　....
　　so,a multi namespaces comes in to address all of above issues (except case 1).with the multi-namespaces,you can say as 'multi hierarchies name management' also,it will benefit from various scalabilities,and hdfs federation is implemented it in hadoop:

feature

resolution
scaling
abstract

federation
scalability
horizontal
union multi namespaces from multi clusters

ha
availablity
horizontal
by adding some standby namenodes

　　so if integrates federation with ha ,u will see this architecture:
　　

hdfs federation + ha schema
　　II.implementation
　　like some linux mount disks,the it's easy to achive this schema:using Client Mount Table to get a global view of all federation namenodes!so it's simple and no signle-point-of-faiure for clients access it.and only make some changes in properties like this:
　　core-site.xml

<xi:include href=“cmt.xml"/>
<property>
<name>fs.defaultFS</name>
<value>viewfs://nsX</value>
<description> </description>
</property>
note:here uses authority 'viewfs' instead of hdfs.

and the cmt.xml

<configuration><property><name>fs.viewfs.mounttable.nsX.link./share</name><value>hdfs://ns1/real_share</value></property><property><name>fs.viewfs.mounttable.nsX.link./user</name><value>hdfs://ns2/real_user</value></property></configuration>
　　this means that in client the namespace '/share' is mapped to 'ns1' s real dir '/real_share' ,and the '/user' is similar to this.so we must create the real dir first:

hdfs dfs -mkdir hdfs://ns1/real_share

hdfs dfs -mkdir hdfs://ns2/real_user
　　 and for hdfs-site.xml is a little different from HA,see Hadoop 2.0 NameNode HA和Federation实践
　　III.other distributed namespaces
　　1. dynamic subtree patitioning(ceph)
　　2.hash-based partitioning(Lustre)
　　IV.advantages vs shortcomings
　　besides above advangates mentioned at first,the shared by multi-datacenter is very important to use federation;and the shortcomings are clear:
　　a.not even data storage among different namespaces maybe occur;
　　ref:
　　hadoop 2.x-HDFS HA --Part I: abstraction
　　hortonworks hdfs federation
　　HDFS scalability with multiple namenodes

Hadoop 2.0 NameNode HA和Federation实践

Scaling HDFS Namenode using Multiple Namespace (Namenodes) and Block Pools

页: [1]

运维网's Archiver

hadoop 2.x-HDFS federation