通过FileSystem API 读取hadoop文件系统数据

zidong 发表于 2016-12-9 06:18:19

　　有时无法应用

URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());
　　来读取hadoop内部数据，我们可以使用FileSystem API 来打开一个inputStream；
　　hadoop文件系统中通过Hadoop Path对象来代表文件（而非java.io.File对象），完全可以将一条路经视为hadoop文件系统URI
　　,例如：hdfs://localhost/user/hadoop/map.txt

package gucas.xiaoxia;
import java.io.IOException;
import java.io.InputStream;
import java.net.URI;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
public class CatFileSystem {
public static void main(String[] args) throws IOException {
String uri = "hdfs://localhost/user/hadoop/map.txt";
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(URI.create(uri), conf);
InputStream input= null;
try{
input =fs.open(new Path(uri));
IOUtils.copyBytes(input, System.out, 4096, false);
}finally{
IOUtils.closeStream(input);
}
}
}

　　输出：

hello world:10hello world:10hello world:10hello world:10hello world:10hello world:10hello world:10hello world:10
hello world:10hello world:10hello world:10hello world:10hello world:10hello world:10hello world:10hello world:10
hello world:10hello world:10hello world:10hello world:10hello world:10hello world:10hello world:10hello world:10
hello world:10hello world:10hello world:10hello world:10hello world:10hello world:10hello world:10hello world:10

页: [1]

运维网's Archiver

通过FileSystem API 读取hadoop文件系统数据