zcl_ccc 发表于 2018-11-1 09:16:31

hadoop grep问题

  今天应业务方要求,找一个指定URL在HDFS原始日志中的记录条数,为了方便, 就直接使用hadoop-examples-*.jar包中的 grep 作业。
  
    提交作业
  


[*]>hadoop jar $HADOOP_HOME/hadoop-examples-*.jar grep -Dmapred.job.queue.name=cp_normal_job_queue /group/*****/2011-08-12/00 /group/*****/grep/2011-08-12/00 'www.****.cn'
[*]11/08/31 17:12:39 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 140330 for yinjie
[*]11/08/31 17:12:39 INFO security.TokenCache: Got dt for hdfs://*****.com/home/hdfs/cluster-data/tmp/mapred/staging/yinjie/.staging/job_201108241351_24681;uri=****:8020;t.service=****:8020
[*]11/08/31 17:12:39 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
[*]11/08/31 17:12:39 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library
[*]11/08/31 17:12:39 INFO mapred.FileInputFormat: Total input paths to process : 22
[*]11/08/31 17:12:40 INFO mapred.JobClient: Running job: job_201108241351_24681
[*]11/08/31 17:12:41 INFO mapred.JobClient:map 0% reduce 0%
[*]11/08/31 17:12:50 INFO mapred.JobClient:map 4% reduce 0%
[*]11/08/31 17:12:51 INFO mapred.JobClient:map 52% reduce 0%
[*]11/08/31 17:12:52 INFO mapred.JobClient:map 60% reduce 0%
[*]11/08/31 17:12:53 INFO mapred.JobClient:map 69% reduce 0%
[*]11/08/31 17:12:54 INFO mapred.JobClient:map 79% reduce 0%
[*]11/08/31 17:12:55 INFO mapred.JobClient:map 84% reduce 0%
[*]11/08/31 17:12:56 INFO mapred.JobClient:map 90% reduce 0%
[*]11/08/31 17:12:57 INFO mapred.JobClient:map 93% reduce 0%
[*]11/08/31 17:12:58 INFO mapred.JobClient:map 95% reduce 27%
[*]11/08/31 17:12:59 INFO mapred.JobClient:map 97% reduce 27%
[*]11/08/31 17:13:01 INFO mapred.JobClient:map 98% reduce 27%
[*]11/08/31 17:13:05 INFO mapred.JobClient:map 99% reduce 27%
[*]11/08/31 17:13:07 INFO mapred.JobClient:map 99% reduce 32%
[*]11/08/31 17:13:09 INFO mapred.JobClient:map 100% reduce 32%
[*]11/08/31 17:13:14 INFO mapred.JobClient:map 100% reduce 100%
[*]11/08/31 17:13:15 INFO mapred.JobClient: Job complete: job_201108241351_24681
[*]11/08/31 17:13:15 INFO mapred.JobClient: Counters: 24
[*]11/08/31 17:13:15 INFO mapred.JobClient:   Job Counters
[*]11/08/31 17:13:15 INFO mapred.JobClient:   Launched reduce tasks=1
[*]11/08/31 17:13:15 INFO mapred.JobClient:   SLOTS_MILLIS_MAPS=1542961
[*]11/08/31 17:13:15 INFO mapred.JobClient:   Total time spent by all reduces waiting after reserving slots (ms)=0
[*]11/08/31 17:13:15 INFO mapred.JobClient:   Total time spent by all maps waiting after reserving slots (ms)=0
[*]11/08/31 17:13:15 INFO mapred.JobClient:   Rack-local map tasks=44
[*]11/08/31 17:13:15 INFO mapred.JobClient:   Launched map tasks=242
[*]11/08/31 17:13:15 INFO mapred.JobClient:   Data-local map tasks=198
[*]11/08/31 17:13:15 INFO mapred.JobClient:   SLOTS_MILLIS_REDUCES=23291
[*]11/08/31 17:13:15 INFO mapred.JobClient:   FileSystemCounters
[*]11/08/31 17:13:15 INFO mapred.JobClient:   FILE_BYTES_READ=3724
[*]11/08/31 17:13:15 INFO mapred.JobClient:   HDFS_BYTES_READ=32281139322
[*]11/08/31 17:13:15 INFO mapred.JobClient:   FILE_BYTES_WRITTEN=14502646
[*]11/08/31 17:13:15 INFO mapred.JobClient:   HDFS_BYTES_WRITTEN=118
[*]11/08/31 17:13:15 INFO mapred.JobClient:   Map-Reduce Framework
[*]11/08/31 17:13:15 INFO mapred.JobClient:   Reduce input groups=1
[*]11/08/31 17:13:15 INFO mapred.JobClient:   Combine output records=143
[*]11/08/31 17:13:15 INFO mapred.JobClient:   Map input records=37526374
[*]11/08/31 17:13:15 INFO mapred.JobClient:   Reduce shuffle bytes=5164
[*]11/08/31 17:13:15 INFO mapred.JobClient:   Reduce output records=1
[*]11/08/31 17:13:15 INFO mapred.JobClient:   Spilled Records=286
[*]11/08/31 17:13:15 INFO mapred.JobClient:   Map output bytes=786984
[*]11/08/31 17:13:15 INFO mapred.JobClient:   Map input bytes=32280203347
[*]11/08/31 17:13:15 INFO mapred.JobClient:   Combine input records=32791
[*]11/08/31 17:13:15 INFO mapred.JobClient:   Map output records=32791
[*]11/08/31 17:13:15 INFO mapred.JobClient:   SPLIT_RAW_BYTES=38731
[*]11/08/31 17:13:15 INFO mapred.JobClient:   Reduce input records=143
[*]11/08/31 17:13:15 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
[*]11/08/31 17:13:15 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 140331 for yinjie
[*]11/08/31 17:13:15 INFO security.TokenCache: Got dt for hdfs://****.com/home/hdfs/cluster-data/tmp/mapred/staging/yinjie/.staging/job_201108241351_24682;uri=****:8020;t.service=****:8020
[*]11/08/31 17:13:15 INFO mapred.FileInputFormat: Total input paths to process : 1
[*]11/08/31 17:13:15 INFO mapred.JobClient: Cleaning up the staging area hdfs://****.com/home/hdfs/cluster-data/tmp/mapred/staging/yinjie/.staging/job_201108241351_24682
[*]org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException
[*]      at org.apache.hadoop.mapred.QueueManager.getQueueACL(QueueManager.java:382)
[*]      at org.apache.hadoop.mapred.JobTracker.getQueueAdmins(JobTracker.java:4422)
[*]      at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
[*]      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
[*]      at java.lang.reflect.Method.invoke(Method.java:597)
[*]      at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
[*]      at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1434)
[*]      at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1430)
[*]      at java.security.AccessController.doPrivileged(Native Method)
[*]      at javax.security.auth.Subject.doAs(Subject.java:396)
[*]      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
[*]      at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1428)
[*]
[*]      at org.apache.hadoop.ipc.Client.call(Client.java:1107)
[*]      at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
[*]      at org.apache.hadoop.mapred.$Proxy6.getQueueAdmins(Unknown Source)
[*]      at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:886)
[*]      at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:833)
[*]      at java.security.AccessController.doPrivileged(Native Method)
[*]      at javax.security.auth.Subject.doAs(Subject.java:396)
[*]      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
[*]      at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:833)
[*]      at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:807)
[*]      at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1242)
[*]      at org.apache.hadoop.examples.Grep.run(Grep.java:84)
[*]      at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
[*]      at org.apache.hadoop.examples.Grep.main(Grep.java:93)
[*]      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[*]      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
[*]      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
[*]      at java.lang.reflect.Method.invoke(Method.java:597)
[*]      at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
[*]      at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
[*]      at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:64)
[*]      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[*]      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
[*]      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
[*]      at java.lang.reflect.Method.invoke(Method.java:597)
[*]      at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
[*]>
  

  发现有错, 比较奇怪, 第一个job成功执行, 第二个却失败了, 从异常来看应该是访问控制权限问题。提交的作业中指定了
  
-Dmapred.job.queue.name=cp_normal_job_queue 参数, 怀疑是不是第一个作业执行时带上该参数, 但后面一个作业没有带上,导致失败
  
只好先查看下$HADOOP_HOME下的conf配置:
  


[*]>cat $HADOOP_HOME/conf/mapred-site.xml
[*]
[*]
[*]
[*]
[*]mapred.job.queue.name
[*]cp_admin_job_queue
[*]   Queue to which a job is submitted. This must match one of the
[*]    queues defined in mapred.queue.names for the system. Also, the ACL setup
[*]    for the queue must allow the current user to submit a job to the queue.
[*]    Before specifying a queue, ensure that the system is configured with
[*]    the queue, and access is allowed for submitting jobs to the queue.
[*]
[*]
[*]....
[*]....
[*]....
[*]
  

  发现mapred.job.queue.name配置值是cp_admin_job_queue而不是提交作业时指定的cp_normal_job_queue, 会不会是第二个作业使用了cp_admin_job_queue值而导致失败。
  
抱着试试的心态,把$HADOOP_HOME/conf配置文件拷贝一份到当前用户目录下
  


[*]>cp -rf $HADOOP_HOME/conf ./
[*]....
[*]>ls
[*]allslaves               configuration.xslfair-scheduler.xmlhadoop-metrics.propertieshdfs-site.xml   mapred-queue-acls.xmlmastersssl-client.xml.example
[*]capacity-scheduler.xmlcore-site.xml      hadoop-env.sh       hadoop-policy.xml          log4j.propertiesmapred-site.xml      slaves   ssl-server.xml.example
[*]>
[*]>vi mapred-site.xml
  

  编辑mapred-site.xml, 把mapred.job.queue.name修改成cp_normal_job_queue 后保存
  
再一次提交作业,使用 --config 参数指定修改后的配置目录
  


[*]>hadoop --config /home/yinjie/conf jar $HADOOP_HOME/hadoop-examples-*.jar grep -Dmapred.job.queue.name=cp_normal_job_queue /group/*****/2011-08-12/01 /group/*****/grep/2011-08-12/01 'www.****.cn'
[*]11/08/31 17:25:19 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 140356 for yinjie
[*]11/08/31 17:25:19 INFO security.TokenCache: Got dt for hdfs://****.com/home/hdfs/cluster-data/tmp/mapred/staging/yinjie/.staging/job_201108241351_24719;uri=****:8020;t.service=****:8020
[*]11/08/31 17:25:19 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
[*]11/08/31 17:25:19 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library
[*]11/08/31 17:25:19 INFO mapred.FileInputFormat: Total input paths to process : 22
[*]11/08/31 17:25:19 INFO mapred.JobClient: Running job: job_201108241351_24719
[*]11/08/31 17:25:20 INFO mapred.JobClient:map 0% reduce 0%
[*]11/08/31 17:25:30 INFO mapred.JobClient:map 4% reduce 0%
[*]11/08/31 17:25:31 INFO mapred.JobClient:map 14% reduce 0%
[*]11/08/31 17:25:32 INFO mapred.JobClient:map 51% reduce 0%
[*]11/08/31 17:25:33 INFO mapred.JobClient:map 63% reduce 0%
[*]11/08/31 17:25:34 INFO mapred.JobClient:map 68% reduce 0%
[*]11/08/31 17:25:35 INFO mapred.JobClient:map 77% reduce 0%
[*]11/08/31 17:25:36 INFO mapred.JobClient:map 87% reduce 0%
[*]11/08/31 17:25:37 INFO mapred.JobClient:map 93% reduce 0%
[*]11/08/31 17:25:38 INFO mapred.JobClient:map 96% reduce 0%
[*]11/08/31 17:25:39 INFO mapred.JobClient:map 97% reduce 0%
[*]11/08/31 17:25:40 INFO mapred.JobClient:map 98% reduce 0%
[*]11/08/31 17:25:42 INFO mapred.JobClient:map 99% reduce 31%
[*]11/08/31 17:25:48 INFO mapred.JobClient:map 100% reduce 31%
[*]11/08/31 17:25:51 INFO mapred.JobClient:map 100% reduce 33%
[*]11/08/31 17:25:53 INFO mapred.JobClient:map 100% reduce 100%
[*]11/08/31 17:25:53 INFO mapred.JobClient: Job complete: job_201108241351_24719
[*]11/08/31 17:25:53 INFO mapred.JobClient: Counters: 24
[*]11/08/31 17:25:53 INFO mapred.JobClient:   Job Counters
[*]11/08/31 17:25:53 INFO mapred.JobClient:   Launched reduce tasks=1
[*]11/08/31 17:25:53 INFO mapred.JobClient:   SLOTS_MILLIS_MAPS=1025313
[*]11/08/31 17:25:53 INFO mapred.JobClient:   Total time spent by all reduces waiting after reserving slots (ms)=0
[*]11/08/31 17:25:53 INFO mapred.JobClient:   Total time spent by all maps waiting after reserving slots (ms)=0
[*]11/08/31 17:25:53 INFO mapred.JobClient:   Rack-local map tasks=26
[*]11/08/31 17:25:53 INFO mapred.JobClient:   Launched map tasks=176
[*]11/08/31 17:25:53 INFO mapred.JobClient:   Data-local map tasks=150
[*]11/08/31 17:25:53 INFO mapred.JobClient:   SLOTS_MILLIS_REDUCES=18297
[*]11/08/31 17:25:53 INFO mapred.JobClient:   FileSystemCounters
[*]11/08/31 17:25:53 INFO mapred.JobClient:   FILE_BYTES_READ=2580
[*]11/08/31 17:25:53 INFO mapred.JobClient:   HDFS_BYTES_READ=22352133231
[*]11/08/31 17:25:53 INFO mapred.JobClient:   FILE_BYTES_WRITTEN=10563326
[*]11/08/31 17:25:53 INFO mapred.JobClient:   HDFS_BYTES_WRITTEN=118
[*]11/08/31 17:25:53 INFO mapred.JobClient:   Map-Reduce Framework
[*]11/08/31 17:25:53 INFO mapred.JobClient:   Reduce input groups=1
[*]11/08/31 17:25:53 INFO mapred.JobClient:   Combine output records=99
[*]11/08/31 17:25:53 INFO mapred.JobClient:   Map input records=26525927
[*]11/08/31 17:25:53 INFO mapred.JobClient:   Reduce shuffle bytes=3624
[*]11/08/31 17:25:53 INFO mapred.JobClient:   Reduce output records=1
[*]11/08/31 17:25:53 INFO mapred.JobClient:   Spilled Records=198
[*]11/08/31 17:25:53 INFO mapred.JobClient:   Map output bytes=515064
[*]11/08/31 17:25:53 INFO mapred.JobClient:   Map input bytes=22351478236
[*]11/08/31 17:25:53 INFO mapred.JobClient:   Combine input records=21461
[*]11/08/31 17:25:53 INFO mapred.JobClient:   Map output records=21461
[*]11/08/31 17:25:53 INFO mapred.JobClient:   SPLIT_RAW_BYTES=28153
[*]11/08/31 17:25:53 INFO mapred.JobClient:   Reduce input records=99
[*]11/08/31 17:25:53 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
[*]11/08/31 17:25:53 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 140359 for yinjie
[*]11/08/31 17:25:53 INFO security.TokenCache: Got dt for hdfs://****.com/home/hdfs/cluster-data/tmp/mapred/staging/yinjie/.staging/job_201108241351_24723;uri=****:8020;t.service=****:8020
[*]11/08/31 17:25:53 INFO mapred.FileInputFormat: Total input paths to process : 1
[*]11/08/31 17:25:53 INFO mapred.JobClient: Running job: job_201108241351_24723
[*]11/08/31 17:25:54 INFO mapred.JobClient:map 0% reduce 0%
[*]11/08/31 17:26:01 INFO mapred.JobClient:map 100% reduce 0%
[*]11/08/31 17:26:13 INFO mapred.JobClient:map 100% reduce 100%
[*]11/08/31 17:26:13 INFO mapred.JobClient: Job complete: job_201108241351_24723
[*]11/08/31 17:26:13 INFO mapred.JobClient: Counters: 23
[*]11/08/31 17:26:13 INFO mapred.JobClient:   Job Counters
[*]11/08/31 17:26:13 INFO mapred.JobClient:   Launched reduce tasks=1
[*]11/08/31 17:26:13 INFO mapred.JobClient:   SLOTS_MILLIS_MAPS=3225
[*]11/08/31 17:26:13 INFO mapred.JobClient:   Total time spent by all reduces waiting after reserving slots (ms)=0
[*]11/08/31 17:26:13 INFO mapred.JobClient:   Total time spent by all maps waiting after reserving slots (ms)=0
[*]11/08/31 17:26:13 INFO mapred.JobClient:   Launched map tasks=1
[*]11/08/31 17:26:13 INFO mapred.JobClient:   Data-local map tasks=1
[*]11/08/31 17:26:13 INFO mapred.JobClient:   SLOTS_MILLIS_REDUCES=8191
[*]11/08/31 17:26:13 INFO mapred.JobClient:   FileSystemCounters
[*]11/08/31 17:26:13 INFO mapred.JobClient:   FILE_BYTES_READ=32
[*]11/08/31 17:26:13 INFO mapred.JobClient:   HDFS_BYTES_READ=248
[*]11/08/31 17:26:13 INFO mapred.JobClient:   FILE_BYTES_WRITTEN=117216
[*]11/08/31 17:26:13 INFO mapred.JobClient:   HDFS_BYTES_WRITTEN=22
[*]11/08/31 17:26:13 INFO mapred.JobClient:   Map-Reduce Framework
[*]11/08/31 17:26:13 INFO mapred.JobClient:   Reduce input groups=1
[*]11/08/31 17:26:13 INFO mapred.JobClient:   Combine output records=0
[*]11/08/31 17:26:13 INFO mapred.JobClient:   Map input records=1
[*]11/08/31 17:26:13 INFO mapred.JobClient:   Reduce shuffle bytes=0
[*]11/08/31 17:26:13 INFO mapred.JobClient:   Reduce output records=1
[*]11/08/31 17:26:13 INFO mapred.JobClient:   Spilled Records=2
[*]11/08/31 17:26:13 INFO mapred.JobClient:   Map output bytes=24
[*]11/08/31 17:26:13 INFO mapred.JobClient:   Map input bytes=32
[*]11/08/31 17:26:13 INFO mapred.JobClient:   Combine input records=0
[*]11/08/31 17:26:13 INFO mapred.JobClient:   Map output records=1
[*]11/08/31 17:26:13 INFO mapred.JobClient:   SPLIT_RAW_BYTES=130
[*]11/08/31 17:26:13 INFO mapred.JobClient:   Reduce input records=1
  

  OK, 作业成功了!


页: [1]
查看完整版本: hadoop grep问题