zrong 发表于 2016-12-7 10:13:43

《Hadoop The Definitive Guide》ch11 Pig

  1. Pig
  Pig是一种用于探索大型数据集的脚本语言,专门用于数据的批处理。
  

  2.安装和启动


export HADOOP_INSTALL=/local/nomad2/hadoop/hadoop-0.20.203.0
export PATH=$PATH:$HADOOP_INSTALL/bin
export JAVA_HOME=/usr/lib/jvm/java-6-sun/
export PIG_INSTALL=/local/nomad2/pig/pig-0.10.0
export PATH=$PATH:$PIG_INSTALL/bin
export PIG_HADOOP_VERSION=20
export PIG_CLASSPATH=$HADOOP_INSTALL/conf/
  



>> pig                        
2012-07-06 05:54:13,371 INFOorg.apache.pig.Main - Apache Pig version 0.10.0 (r1328203) compiled Apr 19 2012, 22:54:12
2012-07-06 05:54:13,372 INFOorg.apache.pig.Main - Logging error messages to: /local/nomad2/hadoop/pig_1341525253367.log
2012-07-06 05:54:13,539 INFOorg.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localhost/
2012-07-06 05:54:13,743 INFOorg.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: localhost:8021
grunt> records = LOAD 'input/ncdc/micro-tab/sample.txt'
AS (year:chararray, temperature:int, quality:int);
grunt>

  3. Grunt
  3.1 LOAD

grunt> records = LOAD 'input/ncdc/micro-tab/sample.txt'
AS (year:chararray, temperature:int, quality:int);
grunt> dump records;
2012-07-06 05:58:16,661 INFOorg.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
2012-07-06 05:58:16,851 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2012-07-06 05:58:16,873 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2012-07-06 05:58:16,873 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2012-07-06 05:58:16,936 INFOorg.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2012-07-06 05:58:16,952 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2012-07-06 05:58:16,956 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job7839038626859326850.jar
2012-07-06 05:58:19,936 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job7839038626859326850.jar created
2012-07-06 05:58:19,947 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2012-07-06 05:58:19,974 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2012-07-06 05:58:20,209 INFOorg.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2012-07-06 05:58:20,210 INFOorg.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2012-07-06 05:58:20,218 INFOorg.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2012-07-06 05:58:20,475 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2012-07-06 05:58:21,245 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201207030133_0009
2012-07-06 05:58:21,245 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://localhost:50030/jobdetails.jsp?jobid=job_201207030133_0009
2012-07-06 05:58:34,295 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete
2012-07-06 05:58:40,839 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2012-07-06 05:58:40,842 INFOorg.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:
HadoopVersion   PigVersion      UserIdStartedAt       FinishedAt      Features
0.20.203.0      0.10.0nomad2      2012-07-06 05:58:16   2012-07-06 05:58:40   UNKNOWN
Success!
Job Stats (time in seconds):
JobId   Maps    Reduces MaxMapTime      MinMapTIme      AvgMapTime      MaxReduceTime   MinReduceTime   AvgReduceTime   Alias   Feature Outputs
job_201207030133_0009   1       0       6       6       6       0       0       0       records MAP_ONLY      hdfs://localhost/tmp/temp279304135/tmp-1218319262,
Input(s):
Successfully read 5 records (432 bytes) from: "hdfs://localhost/user/nomad2/input/ncdc/micro-tab/sample.txt"
Output(s):
Successfully stored 5 records (74 bytes) in: "hdfs://localhost/tmp/temp279304135/tmp-1218319262"
Counters:
Total records written : 5
Total bytes written : 74
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_201207030133_0009

2012-07-06 05:58:40,850 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
2012-07-06 05:58:40,859 INFOorg.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2012-07-06 05:58:40,859 INFOorg.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
(1950,0,1)
(1950,22,1)
(1950,-11,1)
(1949,111,1)
(1949,78,1)

grunt> describe records;
records: {year: chararray,temperature: int,quality: int}

  


3.2 Filter
grunt> filtered_records = FILTER records BY temperature != 9999 AND (quality == 0 OR quality == 1 OR quality == 4 OR quality == 5 OR quality == 9);
grunt> dump filtered_records
2012-07-06 06:01:21,090 INFOorg.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: FILTER
2012-07-06 06:01:21,124 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2012-07-06 06:01:21,127 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2012-07-06 06:01:21,127 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2012-07-06 06:01:21,129 INFOorg.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2012-07-06 06:01:21,131 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2012-07-06 06:01:21,131 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job5684834367376328176.jar
2012-07-06 06:01:23,823 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job5684834367376328176.jar created
2012-07-06 06:01:23,829 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2012-07-06 06:01:23,847 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2012-07-06 06:01:24,021 INFOorg.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2012-07-06 06:01:24,022 INFOorg.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2012-07-06 06:01:24,023 INFOorg.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2012-07-06 06:01:24,347 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201207030133_0010
2012-07-06 06:01:24,347 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://localhost:50030/jobdetails.jsp?jobid=job_201207030133_0010
2012-07-06 06:01:24,350 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2012-07-06 06:01:37,381 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete
2012-07-06 06:01:44,408 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2012-07-06 06:01:44,409 INFOorg.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:
HadoopVersion   PigVersion      UserIdStartedAt       FinishedAt      Features
0.20.203.0      0.10.0nomad2      2012-07-06 06:01:21   2012-07-06 06:01:44   FILTER
Success!
Job Stats (time in seconds):
JobId   Maps    Reduces MaxMapTime      MinMapTIme      AvgMapTime      MaxReduceTime   MinReduceTime   AvgReduceTime   Alias   Feature Outputs
job_201207030133_0010   1       0       6       6       6       0       0       0       filtered_records,recordsMAP_ONLY      hdfs://localhost/tmp/temp279304135/tmp-767579563,
Input(s):
Successfully read 5 records (432 bytes) from: "hdfs://localhost/user/nomad2/input/ncdc/micro-tab/sample.txt"
Output(s):
Successfully stored 5 records (74 bytes) in: "hdfs://localhost/tmp/temp279304135/tmp-767579563"
Counters:
Total records written : 5
Total bytes written : 74
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_201207030133_0010

2012-07-06 06:01:44,414 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
2012-07-06 06:01:44,419 INFOorg.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2012-07-06 06:01:44,419 INFOorg.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
(1950,0,1)
(1950,22,1)
(1950,-11,1)
(1949,111,1)
(1949,78,1)
  


3.3 Group
grunt> grouped_records = GROUP filtered_records BY year;
grunt> dump grouped_records;
2012-07-06 06:02:40,489 INFOorg.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: GROUP_BY,FILTER
2012-07-06 06:02:40,519 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2012-07-06 06:02:40,528 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2012-07-06 06:02:40,528 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2012-07-06 06:02:40,532 INFOorg.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2012-07-06 06:02:40,534 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2012-07-06 06:02:40,535 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job5702139319332202005.jar
2012-07-06 06:02:43,152 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job5702139319332202005.jar created
2012-07-06 06:02:43,157 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2012-07-06 06:02:43,162 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=51
2012-07-06 06:02:43,162 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Neither PARALLEL nor default parallelism is set for this job. Setting number of reducers to 1
2012-07-06 06:02:43,185 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2012-07-06 06:02:43,340 INFOorg.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2012-07-06 06:02:43,340 INFOorg.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2012-07-06 06:02:43,341 INFOorg.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2012-07-06 06:02:43,686 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201207030133_0011
2012-07-06 06:02:43,686 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://localhost:50030/jobdetails.jsp?jobid=job_201207030133_0011
2012-07-06 06:02:43,687 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2012-07-06 06:02:58,225 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete
2012-07-06 06:03:18,786 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2012-07-06 06:03:18,786 INFOorg.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:
HadoopVersion   PigVersion      UserIdStartedAt       FinishedAt      Features
0.20.203.0      0.10.0nomad2      2012-07-06 06:02:40   2012-07-06 06:03:18   GROUP_BY,FILTER
Success!
Job Stats (time in seconds):
JobId   Maps    Reduces MaxMapTime      MinMapTIme      AvgMapTime      MaxReduceTime   MinReduceTime   AvgReduceTime   Alias   Feature Outputs
job_201207030133_0011   1       1       6       6       6       12      12      12      filtered_records,grouped_records,recordsGROUP_BY      hdfs://localhost/tmp/temp279304135/tmp-1975892788,
Input(s):
Successfully read 5 records (432 bytes) from: "hdfs://localhost/user/nomad2/input/ncdc/micro-tab/sample.txt"
Output(s):
Successfully stored 2 records (87 bytes) in: "hdfs://localhost/tmp/temp279304135/tmp-1975892788"
Counters:
Total records written : 2
Total bytes written : 87
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_201207030133_0011

2012-07-06 06:03:18,793 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
2012-07-06 06:03:18,797 INFOorg.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2012-07-06 06:03:18,797 INFOorg.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
(1949,{(1949,111,1),(1949,78,1)})
(1950,{(1950,0,1),(1950,22,1),(1950,-11,1)})
  


3.4 foreach and generate
grunt> max_temp = FOREACH grouped_records GENERATE group, MAX(filtered_records.temperature);
grunt> dump max_temp;
2012-07-06 06:04:38,622 INFOorg.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: GROUP_BY,FILTER
2012-07-06 06:04:38,650 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2012-07-06 06:04:38,654 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.CombinerOptimizer - Choosing to move algebraic foreach to combiner
2012-07-06 06:04:38,663 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2012-07-06 06:04:38,663 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2012-07-06 06:04:38,667 INFOorg.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2012-07-06 06:04:38,669 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2012-07-06 06:04:38,670 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job7466964472073663140.jar
2012-07-06 06:04:41,549 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job7466964472073663140.jar created
2012-07-06 06:04:41,553 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2012-07-06 06:04:41,561 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=51
2012-07-06 06:04:41,561 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Neither PARALLEL nor default parallelism is set for this job. Setting number of reducers to 1
2012-07-06 06:04:41,575 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2012-07-06 06:04:41,743 INFOorg.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2012-07-06 06:04:41,743 INFOorg.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2012-07-06 06:04:41,744 INFOorg.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2012-07-06 06:04:42,076 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201207030133_0012
2012-07-06 06:04:42,076 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://localhost:50030/jobdetails.jsp?jobid=job_201207030133_0012
2012-07-06 06:04:42,080 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2012-07-06 06:04:55,115 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete
2012-07-06 06:05:17,190 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2012-07-06 06:05:17,190 INFOorg.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:
HadoopVersion   PigVersion      UserIdStartedAt       FinishedAt      Features
0.20.203.0      0.10.0nomad2      2012-07-06 06:04:38   2012-07-06 06:05:17   GROUP_BY,FILTER
Success!
Job Stats (time in seconds):
JobId   Maps    Reduces MaxMapTime      MinMapTIme      AvgMapTime      MaxReduceTime   MinReduceTime   AvgReduceTime   Alias   Feature Outputs
job_201207030133_0012   1       1       6       6       6       12      12      12      filtered_records,grouped_records,max_temp,records GROUP_BY,COMBINER       hdfs://localhost/tmp/temp279304135/tmp-954146705,
Input(s):
Successfully read 5 records (432 bytes) from: "hdfs://localhost/user/nomad2/input/ncdc/micro-tab/sample.txt"
Output(s):
Successfully stored 2 records (28 bytes) in: "hdfs://localhost/tmp/temp279304135/tmp-954146705"
Counters:
Total records written : 2
Total bytes written : 28
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_201207030133_0012

2012-07-06 06:05:17,196 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
2012-07-06 06:05:17,201 INFOorg.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2012-07-06 06:05:17,201 INFOorg.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
(1949,111)
(1950,22)
  


3.5 illustrate
grunt> illustrate max_temp;
2012-07-06 06:06:08,418 INFOorg.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localhost/
2012-07-06 06:06:08,419 INFOorg.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: localhost:8021
2012-07-06 06:06:08,449 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2012-07-06 06:06:08,450 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2012-07-06 06:06:08,450 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2012-07-06 06:06:08,450 INFOorg.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2012-07-06 06:06:08,451 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2012-07-06 06:06:08,486 INFOorg.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2012-07-06 06:06:08,486 INFOorg.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2012-07-06 06:06:08,495 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2012-07-06 06:06:08,497 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2012-07-06 06:06:08,497 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2012-07-06 06:06:08,498 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.AccumulatorOptimizer - Reducer is to run in accumulative mode.
2012-07-06 06:06:08,498 INFOorg.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2012-07-06 06:06:08,499 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2012-07-06 06:06:08,505 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=51
2012-07-06 06:06:08,505 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Neither PARALLEL nor default parallelism is set for this job. Setting number of reducers to 1
2012-07-06 06:06:08,533 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2012-07-06 06:06:08,535 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2012-07-06 06:06:08,535 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2012-07-06 06:06:08,536 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.AccumulatorOptimizer - Reducer is to run in accumulative mode.
2012-07-06 06:06:08,536 INFOorg.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2012-07-06 06:06:08,537 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2012-07-06 06:06:08,542 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=51
2012-07-06 06:06:08,542 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Neither PARALLEL nor default parallelism is set for this job. Setting number of reducers to 1
2012-07-06 06:06:08,558 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2012-07-06 06:06:08,560 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2012-07-06 06:06:08,560 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2012-07-06 06:06:08,561 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.AccumulatorOptimizer - Reducer is to run in accumulative mode.
2012-07-06 06:06:08,561 INFOorg.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2012-07-06 06:06:08,562 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2012-07-06 06:06:08,566 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=51
2012-07-06 06:06:08,566 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Neither PARALLEL nor default parallelism is set for this job. Setting number of reducers to 1
2012-07-06 06:06:08,580 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2012-07-06 06:06:08,582 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2012-07-06 06:06:08,582 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2012-07-06 06:06:08,583 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.AccumulatorOptimizer - Reducer is to run in accumulative mode.
2012-07-06 06:06:08,583 INFOorg.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2012-07-06 06:06:08,584 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2012-07-06 06:06:08,588 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=51
2012-07-06 06:06:08,588 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Neither PARALLEL nor default parallelism is set for this job. Setting number of reducers to 1
2012-07-06 06:06:08,600 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2012-07-06 06:06:08,602 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2012-07-06 06:06:08,602 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2012-07-06 06:06:08,602 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.AccumulatorOptimizer - Reducer is to run in accumulative mode.
2012-07-06 06:06:08,603 INFOorg.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2012-07-06 06:06:08,603 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2012-07-06 06:06:08,607 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=51
2012-07-06 06:06:08,607 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Neither PARALLEL nor default parallelism is set for this job. Setting number of reducers to 1
2012-07-06 06:06:08,621 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2012-07-06 06:06:08,622 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2012-07-06 06:06:08,623 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2012-07-06 06:06:08,623 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.AccumulatorOptimizer - Reducer is to run in accumulative mode.
2012-07-06 06:06:08,623 INFOorg.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2012-07-06 06:06:08,624 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2012-07-06 06:06:08,628 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=51
2012-07-06 06:06:08,628 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Neither PARALLEL nor default parallelism is set for this job. Setting number of reducers to 1
2012-07-06 06:06:08,644 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2012-07-06 06:06:08,646 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2012-07-06 06:06:08,646 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2012-07-06 06:06:08,646 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.AccumulatorOptimizer - Reducer is to run in accumulative mode.
2012-07-06 06:06:08,646 INFOorg.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2012-07-06 06:06:08,647 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2012-07-06 06:06:08,651 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=51
2012-07-06 06:06:08,651 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Neither PARALLEL nor default parallelism is set for this job. Setting number of reducers to 1
2012-07-06 06:06:08,662 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2012-07-06 06:06:08,664 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2012-07-06 06:06:08,664 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2012-07-06 06:06:08,664 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.AccumulatorOptimizer - Reducer is to run in accumulative mode.
2012-07-06 06:06:08,664 INFOorg.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2012-07-06 06:06:08,665 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2012-07-06 06:06:08,669 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=51
2012-07-06 06:06:08,669 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Neither PARALLEL nor default parallelism is set for this job. Setting number of reducers to 1
(1949,111,1)
2012-07-06 06:06:08,683 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2012-07-06 06:06:08,684 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2012-07-06 06:06:08,684 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2012-07-06 06:06:08,685 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.AccumulatorOptimizer - Reducer is to run in accumulative mode.
2012-07-06 06:06:08,685 INFOorg.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2012-07-06 06:06:08,686 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2012-07-06 06:06:08,689 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=51
2012-07-06 06:06:08,689 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Neither PARALLEL nor default parallelism is set for this job. Setting number of reducers to 1
2012-07-06 06:06:08,699 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2012-07-06 06:06:08,713 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2012-07-06 06:06:08,714 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2012-07-06 06:06:08,716 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.AccumulatorOptimizer - Reducer is to run in accumulative mode.
2012-07-06 06:06:08,717 INFOorg.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2012-07-06 06:06:08,717 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2012-07-06 06:06:08,720 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=51
2012-07-06 06:06:08,720 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Neither PARALLEL nor default parallelism is set for this job. Setting number of reducers to 1
2012-07-06 06:06:08,738 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2012-07-06 06:06:08,739 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2012-07-06 06:06:08,739 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2012-07-06 06:06:08,740 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.AccumulatorOptimizer - Reducer is to run in accumulative mode.
2012-07-06 06:06:08,740 INFOorg.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2012-07-06 06:06:08,741 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2012-07-06 06:06:08,744 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=51
2012-07-06 06:06:08,744 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Neither PARALLEL nor default parallelism is set for this job. Setting number of reducers to 1
2012-07-06 06:06:08,755 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2012-07-06 06:06:08,757 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2012-07-06 06:06:08,757 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2012-07-06 06:06:08,757 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.AccumulatorOptimizer - Reducer is to run in accumulative mode.
2012-07-06 06:06:08,758 INFOorg.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2012-07-06 06:06:08,759 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2012-07-06 06:06:08,762 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=51
2012-07-06 06:06:08,762 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Neither PARALLEL nor default parallelism is set for this job. Setting number of reducers to 1
2012-07-06 06:06:08,772 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2012-07-06 06:06:08,773 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2012-07-06 06:06:08,774 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2012-07-06 06:06:08,774 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.AccumulatorOptimizer - Reducer is to run in accumulative mode.
2012-07-06 06:06:08,774 INFOorg.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2012-07-06 06:06:08,775 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2012-07-06 06:06:08,778 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=51
2012-07-06 06:06:08,778 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Neither PARALLEL nor default parallelism is set for this job. Setting number of reducers to 1
2012-07-06 06:06:08,788 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2012-07-06 06:06:08,789 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2012-07-06 06:06:08,789 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2012-07-06 06:06:08,789 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.AccumulatorOptimizer - Reducer is to run in accumulative mode.
2012-07-06 06:06:08,789 INFOorg.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2012-07-06 06:06:08,790 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2012-07-06 06:06:08,793 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=51
2012-07-06 06:06:08,793 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Neither PARALLEL nor default parallelism is set for this job. Setting number of reducers to 1
2012-07-06 06:06:08,804 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2012-07-06 06:06:08,805 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2012-07-06 06:06:08,805 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2012-07-06 06:06:08,805 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.AccumulatorOptimizer - Reducer is to run in accumulative mode.
2012-07-06 06:06:08,805 INFOorg.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2012-07-06 06:06:08,806 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2012-07-06 06:06:08,808 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=51
2012-07-06 06:06:08,808 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Neither PARALLEL nor default parallelism is set for this job. Setting number of reducers to 1
2012-07-06 06:06:08,818 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2012-07-06 06:06:08,819 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2012-07-06 06:06:08,819 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2012-07-06 06:06:08,819 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.AccumulatorOptimizer - Reducer is to run in accumulative mode.
2012-07-06 06:06:08,819 INFOorg.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2012-07-06 06:06:08,819 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2012-07-06 06:06:08,822 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=51
2012-07-06 06:06:08,822 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Neither PARALLEL nor default parallelism is set for this job. Setting number of reducers to 1
2012-07-06 06:06:08,832 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2012-07-06 06:06:08,833 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2012-07-06 06:06:08,833 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2012-07-06 06:06:08,833 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.AccumulatorOptimizer - Reducer is to run in accumulative mode.
2012-07-06 06:06:08,834 INFOorg.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2012-07-06 06:06:08,834 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2012-07-06 06:06:08,836 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=51
2012-07-06 06:06:08,837 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Neither PARALLEL nor default parallelism is set for this job. Setting number of reducers to 1
2012-07-06 06:06:08,846 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2012-07-06 06:06:08,846 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2012-07-06 06:06:08,846 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2012-07-06 06:06:08,847 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.AccumulatorOptimizer - Reducer is to run in accumulative mode.
2012-07-06 06:06:08,847 INFOorg.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2012-07-06 06:06:08,847 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2012-07-06 06:06:08,850 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=51
2012-07-06 06:06:08,850 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Neither PARALLEL nor default parallelism is set for this job. Setting number of reducers to 1
----------------------------------------------------------------------------
| records   | year:chararray   | temperature:int   | quality:int   |
----------------------------------------------------------------------------
|             | 1949               | 111               | 1               |
|             | 1949               | 78                  | 1               |
|             | 1949               | 9999                | 1               |
----------------------------------------------------------------------------
-------------------------------------------------------------------------------------
| filtered_records   | year:chararray   | temperature:int   | quality:int   |
-------------------------------------------------------------------------------------
|                      | 1949               | 111               | 1               |
|                      | 1949               | 78                  | 1               |
-------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------
| grouped_records   | group:chararray   | filtered_records:bag{:tuple(year:chararray,temperature:int,quality:int)}                     |
--------------------------------------------------------------------------------------------------------------------------------------------
|                     | 1949                | {(1949, 111, 1), (1949, 78, 1)}                                                            |
--------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------
| max_temp   | group:chararray   | :int   |
-------------------------------------------------
|            | 1949                | 111      |
-------------------------------------------------
  

  3.6 STORE

grunt> store max_temp into 'out' using PigStorage(';');
2012-07-06 06:42:16,785 INFOorg.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: GROUP_BY,FILTER
2012-07-06 06:42:16,806 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2012-07-06 06:42:16,807 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.CombinerOptimizer - Choosing to move algebraic foreach to combiner
2012-07-06 06:42:16,809 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2012-07-06 06:42:16,809 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2012-07-06 06:42:16,811 INFOorg.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2012-07-06 06:42:16,811 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2012-07-06 06:42:16,811 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job5028501529287696495.jar
2012-07-06 06:42:19,342 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job5028501529287696495.jar created
2012-07-06 06:42:19,344 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2012-07-06 06:42:19,347 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=51
2012-07-06 06:42:19,347 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Neither PARALLEL nor default parallelism is set for this job. Setting number of reducers to 1
2012-07-06 06:42:19,357 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2012-07-06 06:42:19,436 INFOorg.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2012-07-06 06:42:19,436 INFOorg.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2012-07-06 06:42:19,437 INFOorg.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2012-07-06 06:42:19,858 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201207030133_0015
2012-07-06 06:42:19,858 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://localhost:50030/jobdetails.jsp?jobid=job_201207030133_0015
2012-07-06 06:42:19,859 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2012-07-06 06:42:31,884 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete
2012-07-06 06:42:54,937 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2012-07-06 06:42:54,938 INFOorg.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:
HadoopVersion   PigVersion      UserIdStartedAt       FinishedAt      Features
0.20.203.0      0.10.0nomad2      2012-07-06 06:42:16   2012-07-06 06:42:54   GROUP_BY,FILTER
Success!
Job Stats (time in seconds):
JobId   Maps    Reduces MaxMapTime      MinMapTIme      AvgMapTime      MaxReduceTime   MinReduceTime   AvgReduceTimeAlias   Feature Outputs
job_201207030133_0015   1       1       6       6       6       12      12      12      filtered_records,grouped_records,max_temp,records      GROUP_BY,COMBINER       hdfs://localhost/user/nomad2/out,
Input(s):
Successfully read 5 records (432 bytes) from: "hdfs://localhost/user/nomad2/input/ncdc/micro-tab/sample.txt"
Output(s):
Successfully stored 2 records (17 bytes) in: "hdfs://localhost/user/nomad2/out"
Counters:
Total records written : 2
Total bytes written : 17
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_201207030133_0015

2012-07-06 06:42:54,943 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
grunt> cat out
1949;111
1950;22
  


3.7 STEAM
grunt> c = stream records through `cut -f 2`;
grunt> dump c
2012-07-06 06:46:07,018 INFOorg.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: STREAMING
2012-07-06 06:46:07,039 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2012-07-06 06:46:07,040 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2012-07-06 06:46:07,040 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2012-07-06 06:46:07,042 INFOorg.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2012-07-06 06:46:07,042 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2012-07-06 06:46:07,043 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job2350549270034741889.jar
2012-07-06 06:46:09,525 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job2350549270034741889.jar created
2012-07-06 06:46:09,526 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2012-07-06 06:46:09,537 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2012-07-06 06:46:09,641 INFOorg.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2012-07-06 06:46:09,641 INFOorg.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2012-07-06 06:46:09,642 INFOorg.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2012-07-06 06:46:10,037 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201207030133_0016
2012-07-06 06:46:10,037 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://localhost:50030/jobdetails.jsp?jobid=job_201207030133_0016
2012-07-06 06:46:10,039 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2012-07-06 06:46:23,084 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete
2012-07-06 06:46:30,106 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2012-07-06 06:46:30,107 INFOorg.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:
HadoopVersion   PigVersion      UserIdStartedAt       FinishedAt      Features
0.20.203.0      0.10.0nomad2      2012-07-06 06:46:07   2012-07-06 06:46:30   STREAMING
Success!
Job Stats (time in seconds):
JobId   Maps    Reduces MaxMapTime      MinMapTIme      AvgMapTime      MaxReduceTime   MinReduceTime   AvgReduceTimeAlias   Feature Outputs
job_201207030133_0016   1       0       6       6       6       0       0       0       c,records       STREAMING,MAP_ONLY   hdfs://localhost/tmp/temp-1533317312/tmp1954401074,
Input(s):
Successfully read 5 records (432 bytes) from: "hdfs://localhost/user/nomad2/input/ncdc/micro-tab/sample.txt"
Output(s):
Successfully stored 5 records (46 bytes) in: "hdfs://localhost/tmp/temp-1533317312/tmp1954401074"
Counters:
Total records written : 5
Total bytes written : 46
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_201207030133_0016

2012-07-06 06:46:30,112 INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
2012-07-06 06:46:30,118 INFOorg.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2012-07-06 06:46:30,118 INFOorg.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
(0)
(22)
(-11)
(111)
(78)
页: [1]
查看完整版本: 《Hadoop The Definitive Guide》ch11 Pig