设为首页 收藏本站
查看: 293|回复: 0

[经验分享] 《Hadoop The Definitive Guide》ch04 Hadoop I/O

[复制链接]

尚未签到

发表于 2016-12-6 08:50:11 | 显示全部楼层 |阅读模式
1.Hadoop comes with a set of primitives for data I/O. Some of these are techniques that are more general than Hadoop, such as data integrity and compression, but deserve

special consideration when dealing with multiterabyte datasets. Others are Hadoop tools or APIs that form the building blocks for developing distributed systems, such as
  serialization frameworks and on-disk data structures.
  2. 压缩

[ate: /local/nomad2/hadoop/tomwhite-hadoop-book-32dae01 ]
>> echo "text" | hadoop StreamCompressor org.apache.hadoop.io.compress.GzipCodec | gunzip -
12/07/02 00:21:12 INFO util.NativeCodeLoader: Loaded the native-hadoop library
12/07/02 00:21:12 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
text

[ate: /local/nomad2/hadoop/tomwhite-hadoop-book-32dae01 ]
>> echo "text" | hadoop PooledStreamCompressor org.apache.hadoop.io.compress.GzipCodec | gunzip -
12/07/02 00:24:45 INFO util.NativeCodeLoader: Loaded the native-hadoop library
12/07/02 00:24:45 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
12/07/02 00:24:45 INFO compress.CodecPool: Got brand-new compressor
text


在MapReduce中使用压缩  3. 序列化
  序列化指的是将结构化对象转为字节流以便于通过网络进行传输或写入持久存储的过程。反序列化指的是将字节流转为一系列结构化对象的过程。
  序列化用于分布式数据处理中两个截然不同的领域:进程间通信和持久存储。
  Hadoop中,节点之间的进程间通信是通过RPC来实现的。
  几个序列化框架 Apache Thrift和Google的 Protocol Buffers,Avro。
  4. 基于文件的数据结构
  4.1 SequenceFileDemo

[ate: /local/nomad2/hadoop/tomwhite-hadoop-book-32dae01 ]
>> hadoop SequenceFileWriteDemo numbers.seq
12/07/02 01:11:00 INFO util.NativeCodeLoader: Loaded the native-hadoop library
12/07/02 01:11:00 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
12/07/02 01:11:00 INFO compress.CodecPool: Got brand-new compressor
[128]   100     One, two, buckle my shoe
[173]   99      Three, four, shut the door
[220]   98      Five, six, pick up sticks
[264]   97      Seven, eight, lay them straight
[314]   96      Nine, ten, a big fat hen
[359]   95      One, two, buckle my shoe
[404]   94      Three, four, shut the door
[451]   93      Five, six, pick up sticks
[495]   92      Seven, eight, lay them straight
[545]   91      Nine, ten, a big fat hen
[590]   90      One, two, buckle my shoe
[635]   89      Three, four, shut the door
[682]   88      Five, six, pick up sticks
[726]   87      Seven, eight, lay them straight
[776]   86      Nine, ten, a big fat hen
[821]   85      One, two, buckle my shoe
[866]   84      Three, four, shut the door
[913]   83      Five, six, pick up sticks
[957]   82      Seven, eight, lay them straight
[1007]  81      Nine, ten, a big fat hen
[1052]  80      One, two, buckle my shoe
[1097]  79      Three, four, shut the door
[1144]  78      Five, six, pick up sticks
[1188]  77      Seven, eight, lay them straight
[1238]  76      Nine, ten, a big fat hen
[1283]  75      One, two, buckle my shoe
[1328]  74      Three, four, shut the door
[1375]  73      Five, six, pick up sticks
[1419]  72      Seven, eight, lay them straight
[1469]  71      Nine, ten, a big fat hen
[1514]  70      One, two, buckle my shoe
[1559]  69      Three, four, shut the door
[1606]  68      Five, six, pick up sticks
[1650]  67      Seven, eight, lay them straight
[1700]  66      Nine, ten, a big fat hen
[1745]  65      One, two, buckle my shoe
[1790]  64      Three, four, shut the door
[1837]  63      Five, six, pick up sticks
[1881]  62      Seven, eight, lay them straight
[1931]  61      Nine, ten, a big fat hen
[1976]  60      One, two, buckle my shoe
[2021]  59      Three, four, shut the door
[2088]  58      Five, six, pick up sticks
[2132]  57      Seven, eight, lay them straight
[2182]  56      Nine, ten, a big fat hen
[2227]  55      One, two, buckle my shoe
[2272]  54      Three, four, shut the door
[2319]  53      Five, six, pick up sticks
[2363]  52      Seven, eight, lay them straight
[2413]  51      Nine, ten, a big fat hen
[2458]  50      One, two, buckle my shoe
[2503]  49      Three, four, shut the door
[2550]  48      Five, six, pick up sticks
[2594]  47      Seven, eight, lay them straight
[2644]  46      Nine, ten, a big fat hen
[2689]  45      One, two, buckle my shoe
[2734]  44      Three, four, shut the door
[2781]  43      Five, six, pick up sticks
[2825]  42      Seven, eight, lay them straight
[2875]  41      Nine, ten, a big fat hen
[2920]  40      One, two, buckle my shoe
[2965]  39      Three, four, shut the door
[3012]  38      Five, six, pick up sticks
[3056]  37      Seven, eight, lay them straight
[3106]  36      Nine, ten, a big fat hen
[3151]  35      One, two, buckle my shoe
[3196]  34      Three, four, shut the door
[3243]  33      Five, six, pick up sticks
[3287]  32      Seven, eight, lay them straight
[3337]  31      Nine, ten, a big fat hen
[3382]  30      One, two, buckle my shoe
[3427]  29      Three, four, shut the door
[3474]  28      Five, six, pick up sticks
[3518]  27      Seven, eight, lay them straight
[3568]  26      Nine, ten, a big fat hen
[3613]  25      One, two, buckle my shoe
[3658]  24      Three, four, shut the door
[3705]  23      Five, six, pick up sticks
[3749]  22      Seven, eight, lay them straight
[3799]  21      Nine, ten, a big fat hen
[3844]  20      One, two, buckle my shoe
[3889]  19      Three, four, shut the door
[3936]  18      Five, six, pick up sticks
[3980]  17      Seven, eight, lay them straight
[4030]  16      Nine, ten, a big fat hen
[4075]  15      One, two, buckle my shoe
[4140]  14      Three, four, shut the door
[4187]  13      Five, six, pick up sticks
[4231]  12      Seven, eight, lay them straight
[4281]  11      Nine, ten, a big fat hen
[4326]  10      One, two, buckle my shoe
[4371]  9       Three, four, shut the door
[4418]  8       Five, six, pick up sticks
[4462]  7       Seven, eight, lay them straight
[4512]  6       Nine, ten, a big fat hen
[4557]  5       One, two, buckle my shoe
[4602]  4       Three, four, shut the door
[4649]  3       Five, six, pick up sticks
[4693]  2       Seven, eight, lay them straight
[4743]  1       Nine, ten, a big fat hen

[ate: /local/nomad2/hadoop/tomwhite-hadoop-book-32dae01 ]
>> hadoop SequenceFileReadDemo numbers.seq
12/07/02 01:15:49 INFO util.NativeCodeLoader: Loaded the native-hadoop library
12/07/02 01:15:49 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
12/07/02 01:15:49 INFO compress.CodecPool: Got brand-new decompressor
[128]   100     One, two, buckle my shoe
[173]   99      Three, four, shut the door
[220]   98      Five, six, pick up sticks
[264]   97      Seven, eight, lay them straight
[314]   96      Nine, ten, a big fat hen
[359]   95      One, two, buckle my shoe
[404]   94      Three, four, shut the door
[451]   93      Five, six, pick up sticks
[495]   92      Seven, eight, lay them straight
[545]   91      Nine, ten, a big fat hen
[590]   90      One, two, buckle my shoe
[635]   89      Three, four, shut the door
[682]   88      Five, six, pick up sticks
[726]   87      Seven, eight, lay them straight
[776]   86      Nine, ten, a big fat hen
[821]   85      One, two, buckle my shoe
[866]   84      Three, four, shut the door
[913]   83      Five, six, pick up sticks
[957]   82      Seven, eight, lay them straight
[1007]  81      Nine, ten, a big fat hen
[1052]  80      One, two, buckle my shoe
[1097]  79      Three, four, shut the door
[1144]  78      Five, six, pick up sticks
[1188]  77      Seven, eight, lay them straight
[1238]  76      Nine, ten, a big fat hen
[1283]  75      One, two, buckle my shoe
[1328]  74      Three, four, shut the door
[1375]  73      Five, six, pick up sticks
[1419]  72      Seven, eight, lay them straight
[1469]  71      Nine, ten, a big fat hen
[1514]  70      One, two, buckle my shoe
[1559]  69      Three, four, shut the door
[1606]  68      Five, six, pick up sticks
[1650]  67      Seven, eight, lay them straight
[1700]  66      Nine, ten, a big fat hen
[1745]  65      One, two, buckle my shoe
[1790]  64      Three, four, shut the door
[1837]  63      Five, six, pick up sticks
[1881]  62      Seven, eight, lay them straight
[1931]  61      Nine, ten, a big fat hen
[1976]  60      One, two, buckle my shoe
[2021*] 59      Three, four, shut the door
[2088]  58      Five, six, pick up sticks
[2132]  57      Seven, eight, lay them straight
[2182]  56      Nine, ten, a big fat hen
[2227]  55      One, two, buckle my shoe
[2272]  54      Three, four, shut the door
[2319]  53      Five, six, pick up sticks
[2363]  52      Seven, eight, lay them straight
[2413]  51      Nine, ten, a big fat hen
[2458]  50      One, two, buckle my shoe
[2503]  49      Three, four, shut the door
[2550]  48      Five, six, pick up sticks
[2594]  47      Seven, eight, lay them straight
[2644]  46      Nine, ten, a big fat hen
[2689]  45      One, two, buckle my shoe
[2734]  44      Three, four, shut the door
[2781]  43      Five, six, pick up sticks
[2825]  42      Seven, eight, lay them straight
[2875]  41      Nine, ten, a big fat hen
[2920]  40      One, two, buckle my shoe
[2965]  39      Three, four, shut the door
[3012]  38      Five, six, pick up sticks
[3056]  37      Seven, eight, lay them straight
[3106]  36      Nine, ten, a big fat hen
[3151]  35      One, two, buckle my shoe
[3196]  34      Three, four, shut the door
[3243]  33      Five, six, pick up sticks
[3287]  32      Seven, eight, lay them straight
[3337]  31      Nine, ten, a big fat hen
[3382]  30      One, two, buckle my shoe
[3427]  29      Three, four, shut the door
[3474]  28      Five, six, pick up sticks
[3518]  27      Seven, eight, lay them straight
[3568]  26      Nine, ten, a big fat hen
[3613]  25      One, two, buckle my shoe
[3658]  24      Three, four, shut the door
[3705]  23      Five, six, pick up sticks
[3749]  22      Seven, eight, lay them straight
[3799]  21      Nine, ten, a big fat hen
[3844]  20      One, two, buckle my shoe
[3889]  19      Three, four, shut the door
[3936]  18      Five, six, pick up sticks
[3980]  17      Seven, eight, lay them straight
[4030]  16      Nine, ten, a big fat hen
[4075*] 15      One, two, buckle my shoe
[4140]  14      Three, four, shut the door
[4187]  13      Five, six, pick up sticks
[4231]  12      Seven, eight, lay them straight
[4281]  11      Nine, ten, a big fat hen
[4326]  10      One, two, buckle my shoe
[4371]  9       Three, four, shut the door
[4418]  8       Five, six, pick up sticks
[4462]  7       Seven, eight, lay them straight
[4512]  6       Nine, ten, a big fat hen
[4557]  5       One, two, buckle my shoe
[4602]  4       Three, four, shut the door
[4649]  3       Five, six, pick up sticks
[4693]  2       Seven, eight, lay them straight
[4743]  1       Nine, ten, a big fat hen



查看写入的内容,
[ate: /local/nomad2/hadoop/tomwhite-hadoop-book-32dae01 ]
>> hadoop fs -text numbers.seq |less



排序和合并序列文件
>> hadoop jar /local/nomad2/hadoop/hadoop-0.20.203.0/hadoop-examples-0.20.203.0.jar sort -r 1 \
more?> -inFormat org.apache.hadoop.mapred.SequenceFileInputFormat \
more?> -outFormat org.apache.hadoop.mapred.SequenceFileOutputFormat \
more?> -outKey org.apache.hadoop.io.IntWritable \
more?> -outValue org.apache.hadoop.io.Text \
more?> numbers.seq sorted
Running on 1 nodes to sort from hdfs://localhost/user/nomad2/numbers.seq into hdfs://localhost/user/nomad2/sorted with 1 reduces.
Job started: Mon Jul 02 01:22:26 CST 2012
12/07/02 01:22:26 INFO mapred.FileInputFormat: Total input paths to process : 1
12/07/02 01:22:26 INFO mapred.JobClient: Running job: job_201207012246_0008
12/07/02 01:22:27 INFO mapred.JobClient:  map 0% reduce 0%
12/07/02 01:22:40 INFO mapred.JobClient:  map 100% reduce 0%
12/07/02 01:22:52 INFO mapred.JobClient:  map 100% reduce 100%
12/07/02 01:22:57 INFO mapred.JobClient: Job complete: job_201207012246_0008
12/07/02 01:22:57 INFO mapred.JobClient: Counters: 26
12/07/02 01:22:57 INFO mapred.JobClient:   Job Counters
12/07/02 01:22:57 INFO mapred.JobClient:     Launched reduce tasks=1
12/07/02 01:22:57 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=16289
12/07/02 01:22:57 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
12/07/02 01:22:57 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
12/07/02 01:22:57 INFO mapred.JobClient:     Launched map tasks=2
12/07/02 01:22:57 INFO mapred.JobClient:     Data-local map tasks=2
12/07/02 01:22:57 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=10069
12/07/02 01:22:57 INFO mapred.JobClient:   File Input Format Counters
12/07/02 01:22:57 INFO mapred.JobClient:     Bytes Read=6613
12/07/02 01:22:57 INFO mapred.JobClient:   File Output Format Counters
12/07/02 01:22:57 INFO mapred.JobClient:     Bytes Written=4005
12/07/02 01:22:57 INFO mapred.JobClient:   FileSystemCounters
12/07/02 01:22:57 INFO mapred.JobClient:     FILE_BYTES_READ=3306
12/07/02 01:22:57 INFO mapred.JobClient:     HDFS_BYTES_READ=6868
12/07/02 01:22:57 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=70016
12/07/02 01:22:57 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=4005
12/07/02 01:22:57 INFO mapred.JobClient:   Map-Reduce Framework
12/07/02 01:22:57 INFO mapred.JobClient:     Map output materialized bytes=3312
12/07/02 01:22:57 INFO mapred.JobClient:     Map input records=100
12/07/02 01:22:57 INFO mapred.JobClient:     Reduce shuffle bytes=2811
12/07/02 01:22:57 INFO mapred.JobClient:     Spilled Records=200
12/07/02 01:22:57 INFO mapred.JobClient:     Map output bytes=3100
12/07/02 01:22:57 INFO mapred.JobClient:     Map input bytes=4660
12/07/02 01:22:57 INFO mapred.JobClient:     Combine input records=0
12/07/02 01:22:57 INFO mapred.JobClient:     SPLIT_RAW_BYTES=190
12/07/02 01:22:57 INFO mapred.JobClient:     Reduce input records=100
12/07/02 01:22:57 INFO mapred.JobClient:     Reduce input groups=100
12/07/02 01:22:57 INFO mapred.JobClient:     Combine output records=0
12/07/02 01:22:57 INFO mapred.JobClient:     Reduce output records=100
12/07/02 01:22:57 INFO mapred.JobClient:     Map output records=100
Job ended: Mon Jul 02 01:22:57 CST 2012
The job took 31 seconds.


4.2 MapFile
[ate: /local/nomad2/hadoop/tomwhite-hadoop-book-32dae01 ]
>> hadoop MapFileWriteDemo numbers.map
12/07/02 01:27:49 INFO util.NativeCodeLoader: Loaded the native-hadoop library
12/07/02 01:27:49 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
12/07/02 01:27:49 INFO compress.CodecPool: Got brand-new compressor
12/07/02 01:27:49 INFO compress.CodecPool: Got brand-new compressor

  


将SequenceFile转化为MapFile
[ate: /local/nomad2/hadoop/tomwhite-hadoop-book-32dae01 ]
>> hadoop jar /local/nomad2/hadoop/hadoop-0.20.203.0/hadoop-examples-0.20.203.0.jar sort -r 1 \^J-inFormat>
Running on 1 nodes to sort from hdfs://localhost/user/nomad2/numbers.seq into hdfs://localhost/user/nomad2/numbers.map with 1 reduces.
Job started: Mon Jul 02 01:31:58 CST 2012
12/07/02 01:31:58 INFO mapred.FileInputFormat: Total input paths to process : 1
12/07/02 01:31:58 INFO mapred.JobClient: Running job: job_201207012246_0010
12/07/02 01:31:59 INFO mapred.JobClient:  map 0% reduce 0%
12/07/02 01:32:13 INFO mapred.JobClient:  map 100% reduce 0%
12/07/02 01:32:25 INFO mapred.JobClient:  map 100% reduce 100%
12/07/02 01:32:30 INFO mapred.JobClient: Job complete: job_201207012246_0010
12/07/02 01:32:30 INFO mapred.JobClient: Counters: 26
12/07/02 01:32:30 INFO mapred.JobClient:   Job Counters
12/07/02 01:32:30 INFO mapred.JobClient:     Launched reduce tasks=1
12/07/02 01:32:30 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=16355
12/07/02 01:32:30 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
12/07/02 01:32:30 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
12/07/02 01:32:30 INFO mapred.JobClient:     Launched map tasks=2
12/07/02 01:32:30 INFO mapred.JobClient:     Data-local map tasks=2
12/07/02 01:32:30 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=10036
12/07/02 01:32:30 INFO mapred.JobClient:   File Input Format Counters
12/07/02 01:32:30 INFO mapred.JobClient:     Bytes Read=6613
12/07/02 01:32:30 INFO mapred.JobClient:   File Output Format Counters
12/07/02 01:32:30 INFO mapred.JobClient:     Bytes Written=4005
12/07/02 01:32:30 INFO mapred.JobClient:   FileSystemCounters
12/07/02 01:32:30 INFO mapred.JobClient:     FILE_BYTES_READ=3306
12/07/02 01:32:30 INFO mapred.JobClient:     HDFS_BYTES_READ=6868
12/07/02 01:32:30 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=70031
12/07/02 01:32:30 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=4005
12/07/02 01:32:30 INFO mapred.JobClient:   Map-Reduce Framework
12/07/02 01:32:30 INFO mapred.JobClient:     Map output materialized bytes=3312
12/07/02 01:32:30 INFO mapred.JobClient:     Map input records=100
12/07/02 01:32:30 INFO mapred.JobClient:     Reduce shuffle bytes=3312
12/07/02 01:32:30 INFO mapred.JobClient:     Spilled Records=200
12/07/02 01:32:30 INFO mapred.JobClient:     Map output bytes=3100
12/07/02 01:32:30 INFO mapred.JobClient:     Map input bytes=4660
12/07/02 01:32:30 INFO mapred.JobClient:     Combine input records=0
12/07/02 01:32:30 INFO mapred.JobClient:     SPLIT_RAW_BYTES=190
12/07/02 01:32:30 INFO mapred.JobClient:     Reduce input records=100
12/07/02 01:32:30 INFO mapred.JobClient:     Reduce input groups=100
12/07/02 01:32:30 INFO mapred.JobClient:     Combine output records=0
12/07/02 01:32:30 INFO mapred.JobClient:     Reduce output records=100
12/07/02 01:32:30 INFO mapred.JobClient:     Map output records=100
Job ended: Mon Jul 02 01:32:30 CST 2012
The job took 32 seconds.

[ate: /local/nomad2/hadoop/tomwhite-hadoop-book-32dae01 ]
>> hadoop fs -mv numbers.map/part-00000 numbers.map/data


[ate: /local/nomad2/hadoop/tomwhite-hadoop-book-32dae01 ]
>> hadoop MapFileFixer numbers.map
12/07/02 01:33:31 INFO util.NativeCodeLoader: Loaded the native-hadoop library
12/07/02 01:33:31 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
12/07/02 01:33:31 INFO compress.CodecPool: Got brand-new compressor
Created MapFile numbers.map with 100 entries

运维网声明 1、欢迎大家加入本站运维交流群:群②:261659950 群⑤:202807635 群⑦870801961 群⑧679858003
2、本站所有主题由该帖子作者发表,该帖子作者与运维网享有帖子相关版权
3、所有作品的著作权均归原作者享有,请您和我们一样尊重他人的著作权等合法权益。如果您对作品感到满意,请购买正版
4、禁止制作、复制、发布和传播具有反动、淫秽、色情、暴力、凶杀等内容的信息,一经发现立即删除。若您因此触犯法律,一切后果自负,我们对此不承担任何责任
5、所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其内容的准确性、可靠性、正当性、安全性、合法性等负责,亦不承担任何法律责任
6、所有作品仅供您个人学习、研究或欣赏,不得用于商业或者其他用途,否则,一切后果均由您自己承担,我们对此不承担任何法律责任
7、如涉及侵犯版权等问题,请您及时通知我们,我们将立即采取措施予以解决
8、联系人Email:admin@iyunv.com 网址:www.yunweiku.com

所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其承担任何法律责任,如涉及侵犯版权等问题,请您及时通知我们,我们将立即处理,联系人Email:kefu@iyunv.com,QQ:1061981298 本贴地址:https://www.iyunv.com/thread-310280-1-1.html 上篇帖子: hadoop之用户定制 下篇帖子: Hadoop学习六:Hadoop-Hdfs源码 classification包
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

扫码加入运维网微信交流群X

扫码加入运维网微信交流群

扫描二维码加入运维网微信交流群,最新一手资源尽在官方微信交流群!快快加入我们吧...

扫描微信二维码查看详情

客服E-mail:kefu@iyunv.com 客服QQ:1061981298


QQ群⑦:运维网交流群⑦ QQ群⑧:运维网交流群⑧ k8s群:运维网kubernetes交流群


提醒:禁止发布任何违反国家法律、法规的言论与图片等内容;本站内容均来自个人观点与网络等信息,非本站认同之观点.


本站大部分资源是网友从网上搜集分享而来,其版权均归原作者及其网站所有,我们尊重他人的合法权益,如有内容侵犯您的合法权益,请及时与我们联系进行核实删除!



合作伙伴: 青云cloud

快速回复 返回顶部 返回列表