hadoop 2.x-HDFS snapshot

kaola4549 · 发表于 2016-12-6 08:46:48

　　I dont want to restruct wheels of open sources,in contrast, just wonder the implied features and use cases as possible.so i will write somethings to summary or memo.
　　Agenda
　　1.what is
　　2.how to
　　3.hadoop snapshot vs hbase snapshot
　　4.demos to use snapshot
　　1.what is
　　a long time ago,the term 'snapshot' was introduced to describe 'the aspect of something in a point in-time',e.g memory snapshot,db's snapshot,or even google's page snapshot etc.but they have the similar or close means:a certain view/image of one thing in history.
　　akin to hadoop's snapshot,we want to use this 'view' to cut the files at a point in-time.so its usages will like this:
　　a. a periodic backup
　　 b.restore some key data from mistaken deletions
　　 c.isolutes some important data from product for testing ,comparing etc
　　 and there are some features among this snapshot:
　　-no any data to be moved or copied,so the network bandwidth is not affected
　　-not causing too many tasks for namenode or datanode to deal with ,so reliability is also kept staying
　　2.how to
　　benefits from hdfs file support of write-once and read-many characteristic,hadoop snapshot uses it to function properly.when create a new snapshot on a dir,the namenode will register this dir as a snapshotable dir to provide protection:all operations include deletion ,move,or creation of files and dirs will only affect the 'metadata' in namenode,so the actual files and dirs will not applied instantly .so after a while,if u want to restore some files/dirs,u can move or copy the snapshoted files or dirs from '.snapshot' dir to anywhere u wnat.when u delete the snapshot created before,then the prior operations will apply right now.
　　for deep study of 'linked data structure' u can check out 'making data structures persistent'
　　 3.hadoop snapshot vs hbase snapshot
　　 according to the version releases between hadoop and hbase,i think hadoop's snapshot is introduced from hbase's one:) ,so the underlying implementions of them are similar.here are some differences in snapshot below:

	hadoop	hbase	supplement
copy/move data	n	n
gen new files refered 　　to original files	n	y	hbase will gen many 　　temp files to point to the 　　real hdfs files

　　 so for a hhbase cluster,i think it's unnecessary to backup(snapshot) hadoop hdfs againt if use hbase snapshot already;else it should be.in the sense that there are most overlapings between both snapshots.
　　 4.demos to use snapshot
　　there are some usage demos in apache official site [2],but i want to declare that this snapshot is 'read-only' (RO) instead of RW,hence then ,if u make some changes in the '.snapshot' dir will cause something errors,in addition ,if u want to check out the real principles of the commands,see details in 'NameNodeRpcServer.java'
　　ref:
　　jira:Support for RW/RO snapshots in HDFS
[2]HDFS Snapshots
hbase -tables replication/snapshot/backup within/cross clusters
hadoop-2.x --new features

账号		自动登录	找回密码
密码			立即注册

Centos6.5×64安装配置openmeetings3.0.3详

大疆运维招人啦，

C++ :try 语句块和异常处理

C++的多态

Red Hat RHCE 8 (EX294) Cert Guide

Java/C++ 区别：看完这一篇，就够用！

别再用过时库了！这 13 个顶级 C++ 库才是

[经验分享] hadoop 2.x-HDFS snapshot

浏览过的版块

扫码加入运维网微信交流群