Hadoop序列化-------总结

coverl 发表于 2018-10-31 09:49:44

　　1.序列化：
　　1.1序列化（serialization）是指将内存中的对象转化为字节流
　　1.2反序列化（Deserialization）是序列化的逆过程，将字节流转化为内存中对象（结构化对象）。
　　1.3java中序列化是实现Serializable接口（java.io.serializable
　　2.Hadoop中序列化：
　　2.1.实现Writable接口（该接口继承了serializable接口）
　　2.2hadoop中序列化的作用：
　　1.高效的使用存储空间
　　2.快速：读写数据的额外开销小
　　3.进程之间的通信
　　4.永久存储
　　2.3hadoop节点之间的通信
　　节点1（消息序列化为二进制流）---------》节点2（二进制流反序列化为消息）
　　2.4Writable接口
　　1.是根据DataInput和DataOutput实现的简单，有效的序列化对象
　　2.MapReduce中的key，value（自定义的数据类型）必须实现Writable接口
　　3.MapReduce中的key必须实现WritableComparable接口（MR默认且只能对key进行排序）
　　4.常见的Writable实现类
　　Writable实现       java基本类型
　　Text                String
　　BooleanWritable       boolean
　　ByteWritable          byte
　　....                .....
　　2.5自定义的数据类型
　　class KpiWritable implements Writable{
　　Long f1;
　　Long f2;
　　/**
　　* 上下字段的顺序必须一致：序列化和反序列化中字段的顺序
　　* @param in
　　* @throws IOException
　　*/
　　@Override
　　//readFiles是把输入流字节反序列化
　　public void readFields(DataInput in) throws IOException {
　　this.f1= in.readLong();
　　this.f2= in.readLong();
　　}
　　@Override
　　//write是把每个对象序列化到输出流
　　public void write(DataOutput out) throws IOException {
　　out.writeLong(f1);
　　out.writeLong(f2);
　　}
　　}

页: [1]

运维网's Archiver

Hadoop序列化-------总结