Homework

浙江雁荡山 · 发表于 2016-12-7 11:08:55

　　In this blog post I introduce some of the benchmarking and testing tools in the Apache Hadoop distribution. Namely, I'll look at TeraSort, NNBench and MRBench. These are popular choices to benchmark a Hadoop cluster.
　　Before we start, let me show you the clusters on which the tests will run:

Three VMWare virtual machines (nodes) run on OS X Mountain Lion
Node1: two processors, 2GB memory, which is used as NameNode as well as DataNode
Node2: 1 processor, 1GB memory, which is used as Secondary NameNode as well as DataNodes
Node3: 1 processor, 1GB memory, which is used as DataNode

　　Now let's start benchmark test.
　　TeraSort benchmark test
　　A full TeraSort benchmark run consists of the following three steps:

Generating the input data via TeraGen.
Running the actual TeraSort on the input data.
Validating the sorted output data via TeraValidate.

　　Now let's generate the input data with:

[iyunv@n1 lib]# hadoop jar hadoop-examples.jar teragen 1000 /user/root/terasort-input
13/07/12 21:37:00 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
Generating 1000 using 2 maps with step of 500
13/07/12 21:37:09 INFO mapred.JobClient: Running job: job_201307122107_0001
13/07/12 21:37:10 INFO mapred.JobClient:  map 0% reduce 0%
13/07/12 21:37:35 INFO mapred.JobClient:  map 50% reduce 0%
13/07/12 21:38:28 INFO mapred.JobClient:  map 100% reduce 0%
13/07/12 21:39:03 INFO mapred.JobClient: Job complete: job_201307122107_0001
13/07/12 21:39:05 INFO mapred.JobClient: Counters: 24
13/07/12 21:39:06 INFO mapred.JobClient: File System Counters
13/07/12 21:39:06 INFO mapred.JobClient:    FILE: Number of bytes read=0
13/07/12 21:39:06 INFO mapred.JobClient:    FILE: Number of bytes written=309768
13/07/12 21:39:06 INFO mapred.JobClient:    FILE: Number of read operations=0
13/07/12 21:39:06 INFO mapred.JobClient:    FILE: Number of large read operations=0
13/07/12 21:39:06 INFO mapred.JobClient:    FILE: Number of write operations=0
13/07/12 21:39:06 INFO mapred.JobClient:    HDFS: Number of bytes read=164
13/07/12 21:39:06 INFO mapred.JobClient:    HDFS: Number of bytes written=100000
13/07/12 21:39:06 INFO mapred.JobClient:    HDFS: Number of read operations=3
13/07/12 21:39:06 INFO mapred.JobClient:    HDFS: Number of large read operations=0
13/07/12 21:39:06 INFO mapred.JobClient:    HDFS: Number of write operations=2
13/07/12 21:39:06 INFO mapred.JobClient: Job Counters
13/07/12 21:39:06 INFO mapred.JobClient:    Launched map tasks=2
13/07/12 21:39:06 INFO mapred.JobClient:    Total time spent by all maps in occupied slots (ms)=93872
13/07/12 21:39:06 INFO mapred.JobClient:    Total time spent by all reduces in occupied slots (ms)=0
13/07/12 21:39:06 INFO mapred.JobClient:    Total time spent by all maps waiting after reserving slots (ms)=0
13/07/12 21:39:06 INFO mapred.JobClient:    Total time spent by all reduces waiting after reserving slots (ms)=0
13/07/12 21:39:06 INFO mapred.JobClient: Map-Reduce Framework
13/07/12 21:39:06 INFO mapred.JobClient:    Map input records=1000
13/07/12 21:39:06 INFO mapred.JobClient:    Map output records=1000
13/07/12 21:39:06 INFO mapred.JobClient:    Input split bytes=164
13/07/12 21:39:06 INFO mapred.JobClient:    Spilled Records=0
13/07/12 21:39:06 INFO mapred.JobClient:    CPU time spent (ms)=1360
13/07/12 21:39:06 INFO mapred.JobClient:    Physical memory (bytes) snapshot=178167808
13/07/12 21:39:06 INFO mapred.JobClient:    Virtual memory (bytes) snapshot=2249502720
13/07/12 21:39:06 INFO mapred.JobClient:    Total committed heap usage (bytes)=48758784
13/07/12 21:39:06 INFO mapred.JobClient: org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter
13/07/12 21:39:06 INFO mapred.JobClient:    BYTES_READ=1000
　　Check the data generated:

[iyunv@n1 lib]# hadoop fs -ls ./terasort-input
Found 4 items
-rw-r--r-- 3 root supergroup       0 2013-07-12 21:38 terasort-input/_SUCCESS
drwxr-xr-x - root supergroup       0 2013-07-12 21:37 terasort-input/_logs
-rw-r--r-- 3 root supergroup    50000 2013-07-12 21:37 terasort-input/part-00000
-rw-r--r-- 3 root supergroup    50000 2013-07-12 21:38 terasort-input/part-00001
　　Run the terasort test:

[iyunv@n1 lib]# hadoop jar hadoop-examples.jar terasort terasort-input terasort-output
13/07/12 21:53:19 INFO terasort.TeraSort: starting
13/07/12 21:53:21 INFO mapred.FileInputFormat: Total input paths to process : 2
13/07/12 21:53:21 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
13/07/12 21:53:21 INFO compress.CodecPool: Got brand-new compressor [.deflate]
Making 1 from 1000 records
Step size is 1000.0
13/07/12 21:53:22 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/07/12 21:53:26 INFO mapred.JobClient: Running job: job_201307122107_0002
13/07/12 21:53:27 INFO mapred.JobClient:  map 0% reduce 0%
13/07/12 21:53:46 INFO mapred.JobClient:  map 100% reduce 0%
13/07/12 21:53:57 INFO mapred.JobClient:  map 100% reduce 100%
13/07/12 21:54:01 INFO mapred.JobClient: Job complete: job_201307122107_0002
13/07/12 21:54:01 INFO mapred.JobClient: Counters: 33
13/07/12 21:54:01 INFO mapred.JobClient: File System Counters
13/07/12 21:54:01 INFO mapred.JobClient:    FILE: Number of bytes read=23088
13/07/12 21:54:01 INFO mapred.JobClient:    FILE: Number of bytes written=520103
13/07/12 21:54:01 INFO mapred.JobClient:    FILE: Number of read operations=0
13/07/12 21:54:01 INFO mapred.JobClient:    FILE: Number of large read operations=0
13/07/12 21:54:01 INFO mapred.JobClient:    FILE: Number of write operations=0
13/07/12 21:54:01 INFO mapred.JobClient:    HDFS: Number of bytes read=100230
13/07/12 21:54:01 INFO mapred.JobClient:    HDFS: Number of bytes written=100000
13/07/12 21:54:01 INFO mapred.JobClient:    HDFS: Number of read operations=4
13/07/12 21:54:01 INFO mapred.JobClient:    HDFS: Number of large read operations=0
13/07/12 21:54:01 INFO mapred.JobClient:    HDFS: Number of write operations=1
13/07/12 21:54:01 INFO mapred.JobClient: Job Counters
13/07/12 21:54:01 INFO mapred.JobClient:    Launched map tasks=2
13/07/12 21:54:01 INFO mapred.JobClient:    Launched reduce tasks=1
13/07/12 21:54:01 INFO mapred.JobClient:    Data-local map tasks=2
13/07/12 21:54:01 INFO mapred.JobClient:    Total time spent by all maps in occupied slots (ms)=26310
13/07/12 21:54:01 INFO mapred.JobClient:    Total time spent by all reduces in occupied slots (ms)=8722
13/07/12 21:54:01 INFO mapred.JobClient:    Total time spent by all maps waiting after reserving slots (ms)=0
13/07/12 21:54:01 INFO mapred.JobClient:    Total time spent by all reduces waiting after reserving slots (ms)=0
13/07/12 21:54:01 INFO mapred.JobClient: Map-Reduce Framework
13/07/12 21:54:01 INFO mapred.JobClient:    Map input records=1000
13/07/12 21:54:01 INFO mapred.JobClient:    Map output records=1000
13/07/12 21:54:01 INFO mapred.JobClient:    Map output bytes=100000
13/07/12 21:54:01 INFO mapred.JobClient:    Input split bytes=230
13/07/12 21:54:01 INFO mapred.JobClient:    Combine input records=0
13/07/12 21:54:01 INFO mapred.JobClient:    Combine output records=0
13/07/12 21:54:01 INFO mapred.JobClient:    Reduce input groups=1000
13/07/12 21:54:01 INFO mapred.JobClient:    Reduce shuffle bytes=22876
13/07/12 21:54:01 INFO mapred.JobClient:    Reduce input records=1000
13/07/12 21:54:01 INFO mapred.JobClient:    Reduce output records=1000
13/07/12 21:54:01 INFO mapred.JobClient:    Spilled Records=2000
13/07/12 21:54:01 INFO mapred.JobClient:    CPU time spent (ms)=3780
13/07/12 21:54:01 INFO mapred.JobClient:    Physical memory (bytes) snapshot=408850432
13/07/12 21:54:01 INFO mapred.JobClient:    Virtual memory (bytes) snapshot=1962823680
13/07/12 21:54:01 INFO mapred.JobClient:    Total committed heap usage (bytes)=147070976
13/07/12 21:54:01 INFO mapred.JobClient: org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter
13/07/12 21:54:01 INFO mapred.JobClient:    BYTES_READ=100000
13/07/12 21:54:01 INFO terasort.TeraSort: done
　　Validate job output with teravalidate:

[iyunv@n1 lib]# hadoop jar hadoop-examples.jar teravalidate terasort-output terasort-validate
13/07/12 21:56:02 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/07/12 21:56:04 INFO mapred.FileInputFormat: Total input paths to process : 1
13/07/12 21:56:10 INFO mapred.JobClient: Running job: job_201307122107_0003
13/07/12 21:56:11 INFO mapred.JobClient:  map 0% reduce 0%
13/07/12 21:56:23 INFO mapred.JobClient:  map 100% reduce 0%
13/07/12 21:56:31 INFO mapred.JobClient:  map 100% reduce 100%
13/07/12 21:56:34 INFO mapred.JobClient: Job complete: job_201307122107_0003
13/07/12 21:56:34 INFO mapred.JobClient: Counters: 33
13/07/12 21:56:34 INFO mapred.JobClient: File System Counters
13/07/12 21:56:34 INFO mapred.JobClient:    FILE: Number of bytes read=69
13/07/12 21:56:34 INFO mapred.JobClient:    FILE: Number of bytes written=310607
13/07/12 21:56:34 INFO mapred.JobClient:    FILE: Number of read operations=0
13/07/12 21:56:34 INFO mapred.JobClient:    FILE: Number of large read operations=0
13/07/12 21:56:34 INFO mapred.JobClient:    FILE: Number of write operations=0
13/07/12 21:56:34 INFO mapred.JobClient:    HDFS: Number of bytes read=100116
13/07/12 21:56:34 INFO mapred.JobClient:    HDFS: Number of bytes written=0
13/07/12 21:56:34 INFO mapred.JobClient:    HDFS: Number of read operations=3
13/07/12 21:56:34 INFO mapred.JobClient:    HDFS: Number of large read operations=0
13/07/12 21:56:34 INFO mapred.JobClient:    HDFS: Number of write operations=2
13/07/12 21:56:34 INFO mapred.JobClient: Job Counters
13/07/12 21:56:34 INFO mapred.JobClient:    Launched map tasks=1
13/07/12 21:56:34 INFO mapred.JobClient:    Launched reduce tasks=1
13/07/12 21:56:34 INFO mapred.JobClient:    Data-local map tasks=1
13/07/12 21:56:34 INFO mapred.JobClient:    Total time spent by all maps in occupied slots (ms)=14493
13/07/12 21:56:34 INFO mapred.JobClient:    Total time spent by all reduces in occupied slots (ms)=6647
13/07/12 21:56:34 INFO mapred.JobClient:    Total time spent by all maps waiting after reserving slots (ms)=0
13/07/12 21:56:34 INFO mapred.JobClient:    Total time spent by all reduces waiting after reserving slots (ms)=0
13/07/12 21:56:34 INFO mapred.JobClient: Map-Reduce Framework
13/07/12 21:56:34 INFO mapred.JobClient:    Map input records=1000
13/07/12 21:56:34 INFO mapred.JobClient:    Map output records=2
13/07/12 21:56:34 INFO mapred.JobClient:    Map output bytes=54
13/07/12 21:56:34 INFO mapred.JobClient:    Input split bytes=116
13/07/12 21:56:34 INFO mapred.JobClient:    Combine input records=0
13/07/12 21:56:34 INFO mapred.JobClient:    Combine output records=0
13/07/12 21:56:34 INFO mapred.JobClient:    Reduce input groups=2
13/07/12 21:56:34 INFO mapred.JobClient:    Reduce shuffle bytes=65
13/07/12 21:56:34 INFO mapred.JobClient:    Reduce input records=2
13/07/12 21:56:34 INFO mapred.JobClient:    Reduce output records=0
13/07/12 21:56:34 INFO mapred.JobClient:    Spilled Records=4
13/07/12 21:56:34 INFO mapred.JobClient:    CPU time spent (ms)=1640
13/07/12 21:56:34 INFO mapred.JobClient:    Physical memory (bytes) snapshot=250499072
13/07/12 21:56:34 INFO mapred.JobClient:    Virtual memory (bytes) snapshot=1310330880
13/07/12 21:56:34 INFO mapred.JobClient:    Total committed heap usage (bytes)=81399808
13/07/12 21:56:34 INFO mapred.JobClient: org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter
13/07/12 21:56:34 INFO mapred.JobClient:    BYTES_READ=100000
　　Hadoop provides a very convenient way to access statistics about a job from the command line:

$ hadoop job -history all terasort-output
　　 Also you can see the detailed result via Hadoop JobTracker web UI.
　　NameNode benchmark (nnbench)
　　NNBench is useful for load testing the NameNode hardware and configuration. It generates a lot of HDFS-related requests with normally very small "payloads" for the sole purpose of putting a high HDFS management stress on the NameNode. The benchmark can simulate requests for creating, reading, renaming and deleting files on HDFS.
　　The syntax of NNBench is as follows:

[iyunv@n1 lib]# hadoop jar /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop-0.20-mapreduce/hadoop-test.jar nnbench
NameNode Benchmark 0.4
Usage: nnbench <options>
Options:
-operation <Available operations are create_write open_read rename delete. This option is mandatory>
* NOTE: The open_read, rename and delete operations assume that the files they operate on, are already available. The create_write operation must be run before running the other operations.
-maps <number of maps. default is 1. This is not mandatory>
-reduces <number of reduces. default is 1. This is not mandatory>
-startTime <time to start, given in seconds from the epoch. Make sure this is far enough into the future, so all maps (operations) will start at the same time>. default is launch time + 2 mins. This is not mandatory
-blockSize <Block size in bytes. default is 1. This is not mandatory>
-bytesToWrite <Bytes to write. default is 0. This is not mandatory>
-bytesPerChecksum <Bytes per checksum for the files. default is 1. This is not mandatory>
-numberOfFiles <number of files to create. default is 1. This is not mandatory>
-replicationFactorPerFile <Replication factor for the files. default is 1. This is not mandatory>
-baseDir <base DFS path. default is /becnhmarks/NNBench. This is not mandatory>
-readFileAfterOpen <true or false. if true, it reads the file and reports the average time to read. This is valid with the open_read operation. default is false. This is not mandatory>
-help: Display the help statement
　　To run NameNode benchmark test with 6 mappers and 3 reducers:

[iyunv@n1 lib]# hadoop jar /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop-0.20-mapreduce/hadoop-test.jar nnbench -operation create_write -maps 6 -reduces 3 -blockSize 1 -typesToWrite 0 -numberOfFiles 100 -replicationFactorPerFile 3 -readFileAfterOpen true -baseDir /benchmarks/NNBench-`hostname -s`
NameNode Benchmark 0.4
13/07/12 22:13:42 INFO hdfs.NNBench: Test Inputs:
13/07/12 22:13:42 INFO hdfs.NNBench:          Test Operation: create_write
13/07/12 22:13:42 INFO hdfs.NNBench:             Start time: 2013-07-12 22:15:42,26
13/07/12 22:13:42 INFO hdfs.NNBench:          Number of maps: 6
13/07/12 22:13:42 INFO hdfs.NNBench:       Number of reduces: 3
13/07/12 22:13:42 INFO hdfs.NNBench:             Block Size: 1
13/07/12 22:13:42 INFO hdfs.NNBench:          Bytes to write: 0
13/07/12 22:13:42 INFO hdfs.NNBench:       Bytes per checksum: 1
13/07/12 22:13:42 INFO hdfs.NNBench:          Number of files: 100
13/07/12 22:13:42 INFO hdfs.NNBench:       Replication factor: 3
13/07/12 22:13:42 INFO hdfs.NNBench:                Base dir: /benchmarks/NNBench-n1
13/07/12 22:13:42 INFO hdfs.NNBench:    Read file after open: true
13/07/12 22:13:43 INFO hdfs.NNBench: Deleting data directory
13/07/12 22:13:43 INFO hdfs.NNBench: Creating 6 control files
13/07/12 22:13:43 WARN conf.Configuration: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
13/07/12 22:13:44 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/07/12 22:13:44 INFO mapred.FileInputFormat: Total input paths to process : 6
13/07/12 22:13:44 WARN conf.Configuration: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
13/07/12 22:13:44 INFO mapred.JobClient: Running job: job_201307122107_0005
13/07/12 22:13:45 INFO mapred.JobClient:  map 0% reduce 0%
13/07/12 22:14:03 INFO mapred.JobClient:  map 33% reduce 0%
13/07/12 22:14:05 INFO mapred.JobClient:  map 67% reduce 0%
13/07/12 22:15:57 INFO mapred.JobClient:  map 83% reduce 0%
13/07/12 22:15:58 INFO mapred.JobClient:  map 100% reduce 0%
13/07/12 22:16:07 INFO mapred.JobClient:  map 100% reduce 67%
13/07/12 22:16:09 INFO mapred.JobClient:  map 100% reduce 100%
13/07/12 22:16:11 INFO mapred.JobClient: Job complete: job_201307122107_0005
13/07/12 22:16:11 INFO mapred.JobClient: Counters: 33
13/07/12 22:16:11 INFO mapred.JobClient: File System Counters
13/07/12 22:16:11 INFO mapred.JobClient:    FILE: Number of bytes read=359
13/07/12 22:16:11 INFO mapred.JobClient:    FILE: Number of bytes written=1448711
13/07/12 22:16:11 INFO mapred.JobClient:    FILE: Number of read operations=0
13/07/12 22:16:11 INFO mapred.JobClient:    FILE: Number of large read operations=0
13/07/12 22:16:11 INFO mapred.JobClient:    FILE: Number of write operations=0
13/07/12 22:16:11 INFO mapred.JobClient:    HDFS: Number of bytes read=1530
13/07/12 22:16:11 INFO mapred.JobClient:    HDFS: Number of bytes written=182
13/07/12 22:16:11 INFO mapred.JobClient:    HDFS: Number of read operations=21
13/07/12 22:16:11 INFO mapred.JobClient:    HDFS: Number of large read operations=0
13/07/12 22:16:11 INFO mapred.JobClient:    HDFS: Number of write operations=4006
13/07/12 22:16:11 INFO mapred.JobClient: Job Counters
13/07/12 22:16:11 INFO mapred.JobClient:    Launched map tasks=6
13/07/12 22:16:11 INFO mapred.JobClient:    Launched reduce tasks=3
13/07/12 22:16:11 INFO mapred.JobClient:    Data-local map tasks=6
13/07/12 22:16:11 INFO mapred.JobClient:    Total time spent by all maps in occupied slots (ms)=498450
13/07/12 22:16:11 INFO mapred.JobClient:    Total time spent by all reduces in occupied slots (ms)=24054
13/07/12 22:16:11 INFO mapred.JobClient:    Total time spent by all maps waiting after reserving slots (ms)=0
13/07/12 22:16:11 INFO mapred.JobClient:    Total time spent by all reduces waiting after reserving slots (ms)=0
13/07/12 22:16:11 INFO mapred.JobClient: Map-Reduce Framework
13/07/12 22:16:11 INFO mapred.JobClient:    Map input records=6
13/07/12 22:16:11 INFO mapred.JobClient:    Map output records=44
13/07/12 22:16:11 INFO mapred.JobClient:    Map output bytes=974
13/07/12 22:16:11 INFO mapred.JobClient:    Input split bytes=786
13/07/12 22:16:11 INFO mapred.JobClient:    Combine input records=0
13/07/12 22:16:11 INFO mapred.JobClient:    Combine output records=0
13/07/12 22:16:11 INFO mapred.JobClient:    Reduce input groups=8
13/07/12 22:16:11 INFO mapred.JobClient:    Reduce shuffle bytes=1227
13/07/12 22:16:11 INFO mapred.JobClient:    Reduce input records=44
13/07/12 22:16:11 INFO mapred.JobClient:    Reduce output records=8
13/07/12 22:16:11 INFO mapred.JobClient:    Spilled Records=88
13/07/12 22:16:11 INFO mapred.JobClient:    CPU time spent (ms)=16050
13/07/12 22:16:11 INFO mapred.JobClient:    Physical memory (bytes) snapshot=1233637376
13/07/12 22:16:11 INFO mapred.JobClient:    Virtual memory (bytes) snapshot=8789716992
13/07/12 22:16:11 INFO mapred.JobClient:    Total committed heap usage (bytes)=525942784
13/07/12 22:16:11 INFO mapred.JobClient: org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter
13/07/12 22:16:11 INFO mapred.JobClient:    BYTES_READ=228
13/07/12 22:16:11 INFO hdfs.NNBench: -------------- NNBench -------------- :
13/07/12 22:16:11 INFO hdfs.NNBench:                               Version: NameNode Benchmark 0.4
13/07/12 22:16:11 INFO hdfs.NNBench:                         Date & time: 2013-07-12 22:16:11,562
13/07/12 22:16:11 INFO hdfs.NNBench:
13/07/12 22:16:11 INFO hdfs.NNBench:                      Test Operation: create_write
13/07/12 22:16:11 INFO hdfs.NNBench:                            Start time: 2013-07-12 22:15:42,26
13/07/12 22:16:11 INFO hdfs.NNBench:                         Maps to run: 6
13/07/12 22:16:11 INFO hdfs.NNBench:                      Reduces to run: 3
13/07/12 22:16:11 INFO hdfs.NNBench:                   Block Size (bytes): 1
13/07/12 22:16:11 INFO hdfs.NNBench:                      Bytes to write: 0
13/07/12 22:16:11 INFO hdfs.NNBench:                   Bytes per checksum: 1
13/07/12 22:16:11 INFO hdfs.NNBench:                      Number of files: 100
13/07/12 22:16:11 INFO hdfs.NNBench:                   Replication factor: 3
13/07/12 22:16:11 INFO hdfs.NNBench:          Successful file operations: 0
13/07/12 22:16:11 INFO hdfs.NNBench:
13/07/12 22:16:11 INFO hdfs.NNBench:       # maps that missed the barrier: 0
13/07/12 22:16:11 INFO hdfs.NNBench:                         # exceptions: 0
13/07/12 22:16:11 INFO hdfs.NNBench:
13/07/12 22:16:11 INFO hdfs.NNBench:             TPS: Create/Write/Close: 0
13/07/12 22:16:11 INFO hdfs.NNBench: Avg exec time (ms): Create/Write/Close: 0.0
13/07/12 22:16:11 INFO hdfs.NNBench:          Avg Lat (ms): Create/Write: NaN
13/07/12 22:16:11 INFO hdfs.NNBench:                   Avg Lat (ms): Close: NaN
13/07/12 22:16:11 INFO hdfs.NNBench:
13/07/12 22:16:11 INFO hdfs.NNBench:                RAW DATA: AL Total #1: 0
13/07/12 22:16:11 INFO hdfs.NNBench:                RAW DATA: AL Total #2: 0
13/07/12 22:16:11 INFO hdfs.NNBench:             RAW DATA: TPS Total (ms): 0
13/07/12 22:16:11 INFO hdfs.NNBench:       RAW DATA: Longest Map Time (ms): 0.0
13/07/12 22:16:11 INFO hdfs.NNBench:                   RAW DATA: Late maps: 0
13/07/12 22:16:11 INFO hdfs.NNBench:             RAW DATA: # of exceptions: 0
13/07/12 22:16:11 INFO hdfs.NNBench:

　　Look at the trick we did here, I use a custom output directory based on the machine's short hostname `hostname -s`. This is simple trick to ensure that one box does not accidentally write into the same output directory of another machine running nnbench at the same time.
　　MapReduce benchmark (mrbench)
　　MRBench loops a small job a number of times. As such it is a very complimentary benchmark to the "large-scale" TeraSort benchmark suite because MRBench checks whether small job runs are responsive and running efficiently on your cluster. It puts its focus on the MapReduce layer as its impact on the HDFS layer is very limited.
　　Default parameters of mrbench is:

-baseDir: /benchmarks/MRBench  [*** see my note above ***]
-numRuns: 1
-maps: 2
-reduces: 1
-inputLines: 1
-inputType: ascending
　　Run mrbench with default parameters:

[iyunv@n1 lib]# hadoop jar /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop-0.20-mapreduce/hadoop-test.jar mrbench
MRBenchmark.0.0.2
13/07/12 22:04:42 INFO mapred.MRBench: creating control file: 1 numLines, ASCENDING sortOrder
13/07/12 22:04:42 INFO mapred.MRBench: created control file: /benchmarks/MRBench/mr_input/input_-1751865361.txt
13/07/12 22:04:43 INFO mapred.MRBench: Running job 0: input=hdfs://n1.example.com:8020/benchmarks/MRBench/mr_input output=hdfs://n1.example.com:8020/benchmarks/MRBench/mr_output/output_-1484101927
13/07/12 22:04:43 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/07/12 22:04:44 INFO mapred.FileInputFormat: Total input paths to process : 1
13/07/12 22:04:47 INFO mapred.JobClient: Running job: job_201307122107_0004
13/07/12 22:04:49 INFO mapred.JobClient:  map 0% reduce 0%
13/07/12 22:05:41 INFO mapred.JobClient:  map 50% reduce 0%
13/07/12 22:05:48 INFO mapred.JobClient:  map 100% reduce 0%
13/07/12 22:05:58 INFO mapred.JobClient:  map 100% reduce 100%
13/07/12 22:06:00 INFO mapred.JobClient: Job complete: job_201307122107_0004
13/07/12 22:06:00 INFO mapred.JobClient: Counters: 33
13/07/12 22:06:00 INFO mapred.JobClient: File System Counters
13/07/12 22:06:00 INFO mapred.JobClient:    FILE: Number of bytes read=27
13/07/12 22:06:00 INFO mapred.JobClient:    FILE: Number of bytes written=468313
13/07/12 22:06:00 INFO mapred.JobClient:    FILE: Number of read operations=0
13/07/12 22:06:00 INFO mapred.JobClient:    FILE: Number of large read operations=0
13/07/12 22:06:00 INFO mapred.JobClient:    FILE: Number of write operations=0
13/07/12 22:06:00 INFO mapred.JobClient:    HDFS: Number of bytes read=261
13/07/12 22:06:00 INFO mapred.JobClient:    HDFS: Number of bytes written=3
13/07/12 22:06:00 INFO mapred.JobClient:    HDFS: Number of read operations=5
13/07/12 22:06:00 INFO mapred.JobClient:    HDFS: Number of large read operations=0
13/07/12 22:06:00 INFO mapred.JobClient:    HDFS: Number of write operations=2
13/07/12 22:06:00 INFO mapred.JobClient: Job Counters
13/07/12 22:06:00 INFO mapred.JobClient:    Launched map tasks=2
13/07/12 22:06:00 INFO mapred.JobClient:    Launched reduce tasks=1
13/07/12 22:06:00 INFO mapred.JobClient:    Data-local map tasks=2
13/07/12 22:06:00 INFO mapred.JobClient:    Total time spent by all maps in occupied slots (ms)=50958
13/07/12 22:06:00 INFO mapred.JobClient:    Total time spent by all reduces in occupied slots (ms)=7753
13/07/12 22:06:00 INFO mapred.JobClient:    Total time spent by all maps waiting after reserving slots (ms)=0
13/07/12 22:06:00 INFO mapred.JobClient:    Total time spent by all reduces waiting after reserving slots (ms)=0
13/07/12 22:06:00 INFO mapred.JobClient: Map-Reduce Framework
13/07/12 22:06:00 INFO mapred.JobClient:    Map input records=1
13/07/12 22:06:00 INFO mapred.JobClient:    Map output records=1
13/07/12 22:06:00 INFO mapred.JobClient:    Map output bytes=5
13/07/12 22:06:00 INFO mapred.JobClient:    Input split bytes=258
13/07/12 22:06:00 INFO mapred.JobClient:    Combine input records=0
13/07/12 22:06:00 INFO mapred.JobClient:    Combine output records=0
13/07/12 22:06:00 INFO mapred.JobClient:    Reduce input groups=1
13/07/12 22:06:00 INFO mapred.JobClient:    Reduce shuffle bytes=39
13/07/12 22:06:00 INFO mapred.JobClient:    Reduce input records=1
13/07/12 22:06:00 INFO mapred.JobClient:    Reduce output records=1
13/07/12 22:06:00 INFO mapred.JobClient:    Spilled Records=2
13/07/12 22:06:00 INFO mapred.JobClient:    CPU time spent (ms)=2920
13/07/12 22:06:00 INFO mapred.JobClient:    Physical memory (bytes) snapshot=398467072
13/07/12 22:06:00 INFO mapred.JobClient:    Virtual memory (bytes) snapshot=3889000448
13/07/12 22:06:00 INFO mapred.JobClient:    Total committed heap usage (bytes)=204607488
13/07/12 22:06:00 INFO mapred.JobClient: org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter
13/07/12 22:06:00 INFO mapred.JobClient:    BYTES_READ=2
DataLinesMapsReducesAvgTime (milliseconds)
12177797
　　 This means that the average finish time of executed jobs was 78 seconds.

账号		自动登录	找回密码
密码			立即注册

Centos6.5×64安装配置openmeetings3.0.3详

大疆运维招人啦，

C++ :try 语句块和异常处理

C++的多态

Red Hat RHCE 8 (EX294) Cert Guide

Java/C++ 区别：看完这一篇，就够用！

别再用过时库了！这 13 个顶级 C++ 库才是

[经验分享] Homework

浏览过的版块

扫码加入运维网微信交流群