个人Hadoop 错误列表

sunage001 发表于 2016-12-6 10:46:41

错误1：Too many fetch-failures
　　Reduce task
启动后第一个阶段是
shuffle
，即向
map
端
fetch
数据。每次
fetch
都可能因为
connect
超时，
read
超时，
checksum
错误等原因而失败。
Reduce task
为每个
map
设置了一个计数器，用以记录
fetch
该
map
输出时失败的次数。当失败次数达到一定阈值时，会通知
JobTrackerfetch
该
map
输出操作失败次数太多了，并打印如下
log
：

　　Failed
to fetch map-output from attempt_201105261254_102769_m_001802_0 even
after MAX_FETCH_RETRIES_PER_MAP retries...reporting to the JobTracker

　　其中阈值计算方式为：

　　max
(
MIN_FETCH_RETRIES_PER_MAP

,

　　
getClosestPowerOf2
((
this

.
maxBackoff
* 1000 /
BACKOFF_INIT

) + 1));

　　默认情况下
MIN_FETCH_RETRIES_PER_MAP=2
maxBackoff=300 BACKOFF_INIT=4000

，

因此默认阈值为
6
，可通过修改
mapred.reduce.copy.backoff
参数来调整。

　　当达到阈值后，
Reduce task
通过
umbilical
协议告诉
TaskTracker
，
TaskTracker
在下一次
heartbeat
时，通知
JobTracker
。当
JobTracker
发现超过
50%
的
Reduce
汇报
fetch
某个
map
的输出多次失败后，
JobTracker
会
failed
掉该
map
并重新调度，打印如下
log
：

　　"Too many fetch-failures for output of task: attempt_201105261254_102769_m_001802_0 ... killing it"

错误2：Task attempt failed to report status for 622 seconds. Killing
　　

The description for mapred.task.timeout which defaults to 600s says "The number of milliseconds before a task will be terminated if it neither reads an input, writes an output, nor updates its status string. "
Increasing the value of mapred.task.timeout might solve the problem, but you need to figure out if more than 600s is actually required for the map task to complete processing the input data or if there is a bug in the code which needs to be debugged.
According to the Hadoop best practices, on average a map task should take a minute or so to process an InputSplit.

错误3：Hadoop: Blacklisted tasktracker

Put following config in conf/hdfs-site.xml:

<property>
<name>dfs.hosts</name>
<value>/full/path/to/whitelisted/node/file</value>
</property>

Use following command to ask Hadoop to refresh node status to based on configuration.

./bin/hadoop dfsadmin -refreshNodes
　　http://serverfault.com/questions/288440/hadoop-blacklisted-tasktracker

页: [1]

运维网's Archiver

个人Hadoop 错误列表