Hadoop Job失败解决

z7369 · 发表于 2016-12-6 08:58:02

　　现象：map 某个task始终实行失败，直到超时，attemp task重试四次，最后task失败
　　查看jobtracker发现每次都是固定的task，找到该task所在节点，查看log，搜索该taskid
　　如：

cat hadoop-hadoop-tasktracker-DB1221.log.2012-06-26 | grep attempt_201206081842_0456_m_000392_0

2012-06-26 17:44:23,543 INFO org.apache.hadoop.mapred.TaskTracker: JVM with ID: jvm_201206081842_0456_m_-1061492923 given task: attempt_201206081842_0456_m_000392_0
2012-06-26 17:44:30,385 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201206081842_0456_m_000392_0 0.5560105%
2012-06-26 17:44:33,387 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201206081842_0456_m_000392_0 0.5560105%
2012-06-26 17:54:35,277 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201206081842_0456_m_000392_0: Task attempt_201206081842_0456_m_000392_0 failed to report status for 601 seconds. Killing!
2012-06-26 17:54:35,300 INFO org.apache.hadoop.mapred.TaskTracker: About to purge task: attempt_201206081842_0456_m_000392_0
　　每次都是执行到固定百分比然后无响应直到超时。
　　解决办法：
　　map中用try catch试图打印出错日志，结果失败
　　又在map中加入
　　

InputSplit inputSplit=(InputSplit)context.getInputSplit();
String filename=((FileSplit)inputSplit).getPath().getName();
　　并打印，然后执行job，到失败节点查看stdout，取得失败的文件。然后分析文件。

cat data | awk -F "\t" '{if(length($0)>100000000) print $0}'

cat data | awk -F "\t" '{d[length($0)]++}END{for(i in d) print i"\t"d}'|sort -k 1,1 -nr | less
　　发现有一行记录超过200M。
　　解决方法1：
　　

cat data1 |awk -F "\t" '{if(length($0)<100000000)print $0}'>data2
　　把超长记录清楚，重新处理。
　　解决方法2：
　　

hadoop the definitive guide 写道

If you are using TextInputFormat (“TextInputFormat” on page 244),
then you can set a maximum expected line length to safeguard against
corrupted files. Corruption in a file can manifest itself as a very long line,
which can cause out of memory errors and then task failure. By setting
mapred.linerecordreader.maxlength to a value in bytes that fits in mem-ory (and is comfortably greater than the length of lines in your input
data), the record reader will skip the (long) corrupt lines without the
task failing.
　　通过job中设置 mapred.linerecordreader.maxlength 参数或者集群参数跳过坏记录
　　

Configuration conf = new Configuration();
conf.setInt("mapred.linerecordreader.maxlength", 32768);
　　具体参考hadoop the definitive guide p218.
　　另hdg中说会跳过超长记录，但代码中讲会忽略超长记录后面的内容，
　　查看 TextInputFormat：
　　http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-737/org/apache/hadoop/mapreduce/lib/input/TextInputFormat.java
　　

public class TextInputFormat extends FileInputFormat<LongWritable, Text> {
36
37  @Override
38  public RecordReader<LongWritable, Text>
39 createRecordReader(InputSplit split,
40                      TaskAttemptContext context) {
41 return new LineRecordReader();
42  }
43
44  @Override
45  protected boolean isSplitable(JobContext context, Path file) {
46 CompressionCodec codec =
47    new CompressionCodecFactory(context.getConfiguration()).getCodec(file);
48 return codec == null;
49  }
50
51}
　　LineRecordReader：
　　http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-737/org/apache/hadoop/mapreduce/lib/input/LineRecordReader.java#LineRecordReader
　　

this.maxLineLength = job.getInt("mapred.linerecordreader.maxlength",
Integer.MAX_VALUE);
　　

in = new LineReader(codec.createInputStream(fileIn), job);
　　LineReader：
　　http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-737/org/apache/hadoop/util/LineReader.java#LineReader
　　

Read one line from the InputStream into the given Text. A line can be terminated by one of the following: '\n' (LF) , '\r' (CR), or '\r\n' (CR+LF). EOF also terminates an otherwise unterminated line.
Parameters:
str the object to store the given line (without newline)
maxLineLength the maximum number of bytes to store into str; the rest of the line is silently discarded.
maxBytesToConsume the maximum number of bytes to consume in this call. This is only a hint, because if the line cross this threshold, we allow it to happen. It can overshoot potentially by as much as one buffer length.
Returns:
the number of bytes read including the (longest) newline found.
Throws:
java.io.IOException if the underlying stream throws
104
105  public int readLine(Text str, int maxLineLength,
106                   int maxBytesToConsume) throws IOException {
107 /* We're reading data from in, but the head of the stream may be
108    * already buffered in buffer, so we have several cases:
109    * 1. No newline characters are in the buffer, so we need to copy
110    * everything and read another buffer from the stream.
111    * 2. An unambiguously terminated line is in buffer, so we just
112    * copy to str.
113    * 3. Ambiguously terminated line is in buffer, i.e. buffer ends
114    * in CR.  In this case we copy everything up to CR to str, but
115    * we also need to see what follows CR: if it's LF, then we
116    * need consume LF as well, so next call to readLine will read
117    * from after that.
118    * We use a flag prevCharCR to signal if previous character was CR
119    * and, if it happens to be at the end of the buffer, delay
120    * consuming it until we have a chance to look at the char that
121    * follows.
122    */
123 str.clear();
124 int txtLength = 0; //tracks str.getLength(), as an optimization
125 int newlineLength = 0; //length of terminating newline
126 boolean prevCharCR = false; //true of prev char was CR
127 long bytesConsumed = 0;
128 do {
129    int startPosn = bufferPosn; //starting from where we left off the last time
130    if (bufferPosn >= bufferLength) {
131       startPosn = bufferPosn = 0;
132       if (prevCharCR)
133       ++bytesConsumed; //account for CR from previous read
134       bufferLength = in.read(buffer);
135       if (bufferLength <= 0)
136       break; // EOF
137    }
138    for (; bufferPosn < bufferLength; ++bufferPosn) { //search for newline
139       if (buffer[bufferPosn] == LF) {
140       newlineLength = (prevCharCR) ? 2 : 1;
141       ++bufferPosn; // at next invocation proceed from following byte
142       break;
143       }
144       if (prevCharCR) { //CR + notLF, we are at notLF
145       newlineLength = 1;
146       break;
147       }
148       prevCharCR = (buffer[bufferPosn] == CR);
149    }
150    int readLength = bufferPosn - startPosn;
151    if (prevCharCR && newlineLength == 0)
152       --readLength; //CR at the end of the buffer
153    bytesConsumed += readLength;
154    int appendLength = readLength - newlineLength;
155    if (appendLength > maxLineLength - txtLength) {
156       appendLength = maxLineLength - txtLength;
157    }
158    if (appendLength > 0) {
159       str.append(buffer, startPosn, appendLength);
160       txtLength += appendLength;
161    }
162 } while (newlineLength == 0 && bytesConsumed < maxBytesToConsume);
163
164 if (bytesConsumed > (long)Integer.MAX_VALUE)
165    throw new IOException("Too many bytes before newline: " + bytesConsumed);
166 return (int)bytesConsumed;
167  }

账号		自动登录	找回密码
密码			立即注册

Centos6.5×64安装配置openmeetings3.0.3详

大疆运维招人啦，

C++ :try 语句块和异常处理

C++的多态

Red Hat RHCE 8 (EX294) Cert Guide

Java/C++ 区别：看完这一篇，就够用！

别再用过时库了！这 13 个顶级 C++ 库才是

[经验分享] Hadoop Job失败解决

浏览过的版块

扫码加入运维网微信交流群