I dont want to restruct wheels of open sources,in contrast, just wonder the implied features and use cases as possible.so i will write somethings to summary or memo. Agenda 1.what is 2.how to 3.hadoop snapshot vs hbase snapshot 4.demos to use snapshot 1.what is
a long time ago,the term 'snapshot' was introduced to describe 'the aspect of something in a point in-time',e.g memory snapshot,db's snapshot,or even google's page snapshot etc.but they have the similar or close means:a certain view/image of one thing in history.
akin to hadoop's snapshot,we want to use this 'view' to cut the files at a point in-time.so its usages will like this:
a. a periodic backup
b.restore some key data from mistaken deletions
c.isolutes some important data from product for testing ,comparing etc
and there are some features among this snapshot:
-no any data to be moved or copied,so the network bandwidth is not affected
-not causing too many tasks for namenode or datanode to deal with ,so reliability is also kept staying 2.how to
benefits from hdfs file support of write-once and read-many characteristic,hadoop snapshot uses it to function properly.when create a new snapshot on a dir,the namenode will register this dir as a snapshotable dir to provide protection:all operations include deletion ,move,or creation of files and dirs will only affect the 'metadata' in namenode,so the actual files and dirs will not applied instantly .so after a while,if u want to restore some files/dirs,u can move or copy the snapshoted files or dirs from '.snapshot' dir to anywhere u wnat.when u delete the snapshot created before,then the prior operations will apply right now.
for deep study of 'linked data structure' u can check out 'making data structures persistent' 3.hadoop snapshot vs hbase snapshot
according to the version releases between hadoop and hbase,i think hadoop's snapshot is introduced from hbase's one:) ,so the underlying implementions of them are similar.here are some differences in snapshot below:
hadoop
hbase
supplement
copy/move data
n
n
gen new files refered
to original files
n
y
hbase will gen many
temp files to point to the
real hdfs files
so for a hhbase cluster,i think it's unnecessary to backup(snapshot) hadoop hdfs againt if use hbase snapshot already;else it should be.in the sense that there are most overlapings between both snapshots. 4.demos to use snapshot
there are some usage demos in apache official site [2],but i want to declare that this snapshot is 'read-only' (RO) instead of RW,hence then ,if u make some changes in the '.snapshot' dir will cause something errors,in addition ,if u want to check out the real principles of the commands,see details in 'NameNodeRpcServer.java'
ref:
jira:Support for RW/RO snapshots in HDFS
[2]HDFS Snapshots
hbase -tables replication/snapshot/backup within/cross clusters
hadoop-2.x --new features