remington_young 发表于 2016-12-7 09:56:45

Hadoop的OutputFormat和InputFormat

  Hadoop用于数据的输入和输出,需要指定OutputFormat和InputFormat,这两个类的目的是为了指明读数据和写数据相关的包括格式等信息。
  InputFormat:

public abstract
List<InputSplit> getSplits(JobContext context
) throws IOException, InterruptedException;
public abstract
RecordReader<K,V> createRecordReader(InputSplit split,
TaskAttemptContext context
) throws IOException,
InterruptedException;
  createRecordReader:指明具体的读操作
  getSplits:获取要读的数据块
  我们可以看到InputSplit的类:

public abstract long getLength() throws IOException, InterruptedException;
public abstract
String[] getLocations() throws IOException, InterruptedException;
  具体的路径和长度
  OutputFormat:

public abstract RecordWriter<K, V>
getRecordWriter(TaskAttemptContext context
) throws IOException, InterruptedException;
public abstract void checkOutputSpecs(JobContext context
) throws IOException,
InterruptedException;
public abstract
OutputCommitter getOutputCommitter(TaskAttemptContext context
) throws IOException, InterruptedException;
  getRecordWriter:具体记录的写的方式
  checkOutputSpecs:检测数据输出空间
  getOutputCommitter:写flush操作
页: [1]
查看完整版本: Hadoop的OutputFormat和InputFormat