设为首页 收藏本站
查看: 828|回复: 0

[经验分享] 重温hadoop(3)--序列化

[复制链接]

尚未签到

发表于 2016-12-7 06:54:48 | 显示全部楼层 |阅读模式
  Hadoop的序列化机制特征:


  • 紧凑:带宽是hadoop集群中最稀缺的资源,一个紧凑的序列化机制可以充分利用带宽。
  • 快速:mapreduce会大量的使用序列化机制。因此,要尽可能减少序列化开销。
  • 可扩张:序列化机制需要可定制
  • 互操作:可以支持不同开发语言间的通信。
      java本身的序列化,将要序列化的类,类签名、类的所有非暂态和非静态成员的值,以及所有的父类都要写入,导致序列化的对象过于充实。可能比原来扩大了几十上百倍。
     由上面的条件,hadoop自定义了序列化机制,引入[size=1.1em]org.apache.hadoop.io.Writable

/**
* A serializable object which implements a simple, efficient, serialization
* protocol, based on {@link DataInput} and {@link DataOutput}.
*
* <p>Any <code>key</code> or <code>value</code> type in the Hadoop Map-Reduce
* framework implements this interface.</p>
*
* <p>Implementations typically implement a static <code>read(DataInput)</code>
* method which constructs a new instance, calls {@link #readFields(DataInput)}
* and returns the instance.</p>
*
* <p>Example:</p>
* <p><blockquote><pre>
*     public class MyWritable implements Writable {
*       // Some data     
*       private int counter;
*       private long timestamp;
*      
*       public void write(DataOutput out) throws IOException {
*         out.writeInt(counter);
*         out.writeLong(timestamp);
*       }
*      
*       public void readFields(DataInput in) throws IOException {
*         counter = in.readInt();
*         timestamp = in.readLong();
*       }
*      
*       public static MyWritable read(DataInput in) throws IOException {
*         MyWritable w = new MyWritable();
*         w.readFields(in);
*         return w;
*       }
*     }
* </pre></blockquote></p>
*/
public interface Writable {
/**
* Serialize the fields of this object to <code>out</code>.
* 输出(序列化)对象到流中
* @param out <code>DataOuput</code> to serialize this object into.
* @throws IOException
*/
void write(DataOutput out) throws IOException;
/**
* Deserialize the fields of this object from <code>in</code>.  
* 从流中读取(反序列化)对象,为了效率请尽可能服用现有的对象
* <p>For efficiency, implementations should attempt to re-use storage in the
* existing object where possible.</p>
*
* @param in <code>DataInput</code> to deseriablize this object from.
* @throws IOException
*/
void readFields(DataInput in) throws IOException;
}
  [size=1.1em]    hadoop序列化机制还包括几个重要的接口[size=1.1em]org.apache.hadoop.io.WritableComparable,[size=1.1em]org.apache.hadoop.io.WritableComparator,RawComparator
  org.apache.hadoop.io.WritableComparable:继承java.lang.Comparable,来提供类型比较。

 
  org.apache.hadoop.io.RawComparator:继承[size=1.1em]java.util.Comparator,允许去比较未被序列化为对象的记录,省去了创建对象的所有开销。
  java.util.Comparator和的java.lang.Comparable区别可见 http://lavasoft.blog.iyunv.com/62575/68380
  每个java基本类型都对应Writable封装。
  ObjectWritable可应用在需要序列化不同类型的对象到某一个字段,也可用在hadoop远程过程调用中参数的序列化和反序列化。
  ObjectWritable的实现:
  有三个成员变量,包括被封装的对象实例instance、该对象运行时类的Class对象和Configuration对象。
  ObjectWritable的write方法调用的是静态方法ObjectWritable .writeObject()该方法可以往DataOutput接口中写入各种java对象。

/** Write a {@link Writable}, {@link String}, primitive type, or an array of
* the preceding. */
public static void writeObject(DataOutput out, Object instance,
Class declaredClass,
Configuration conf) throws IOException {
if (instance == null) {                       // null 空
instance = new NullInstance(declaredClass, conf);
declaredClass = Writable.class;
}
UTF8.writeString(out, declaredClass.getName()); // always write declared
if (declaredClass.isArray()) {                // array 数组
int length = Array.getLength(instance);
out.writeInt(length);
for (int i = 0; i < length; i++) {
writeObject(out, Array.get(instance, i),
declaredClass.getComponentType(), conf);
}
} else if (declaredClass == String.class) {   // String 字符串
UTF8.writeString(out, (String)instance);
} else if (declaredClass.isPrimitive()) {     // primitive type 基本类型
if (declaredClass == Boolean.TYPE) {        // boolean
out.writeBoolean(((Boolean)instance).booleanValue());
} else if (declaredClass == Character.TYPE) { // char
out.writeChar(((Character)instance).charValue());
} else if (declaredClass == Byte.TYPE) {    // byte
out.writeByte(((Byte)instance).byteValue());
} else if (declaredClass == Short.TYPE) {   // short
out.writeShort(((Short)instance).shortValue());
} else if (declaredClass == Integer.TYPE) { // int
out.writeInt(((Integer)instance).intValue());
} else if (declaredClass == Long.TYPE) {    // long
out.writeLong(((Long)instance).longValue());
} else if (declaredClass == Float.TYPE) {   // float
out.writeFloat(((Float)instance).floatValue());
} else if (declaredClass == Double.TYPE) {  // double
out.writeDouble(((Double)instance).doubleValue());
} else if (declaredClass == Void.TYPE) {    // void
} else {
throw new IllegalArgumentException("Not a primitive: "+declaredClass);
}
} else if (declaredClass.isEnum()) {         // enum 枚举
UTF8.writeString(out, ((Enum)instance).name());
} else if (Writable.class.isAssignableFrom(declaredClass)) { // Writable Writable的子类
UTF8.writeString(out, instance.getClass().getName());
((Writable)instance).write(out);
} else {
throw new IOException("Can't write: "+instance+" as "+declaredClass);
}
}
  和输出对应的调用的是[size=1.1em]org.apache.hadoop.io.ObjectWritable[size=1.1em].readObject(DataInput[size=1.1em] in, ObjectWritable[size=1.1em] objectWritable, Configuration[size=1.1em] conf)
  [size=1.1em]   

/** Read a {@link Writable}, {@link String}, primitive type, or an array of
* the preceding. */
@SuppressWarnings("unchecked")
public static Object readObject(DataInput in, ObjectWritable objectWritable, Configuration conf)
throws IOException {
String className = UTF8.readString(in);
Class<?> declaredClass = PRIMITIVE_NAMES.get(className);
if (declaredClass == null) {
try {
declaredClass = conf.getClassByName(className);
} catch (ClassNotFoundException e) {
throw new RuntimeException("readObject can't find class " + className, e);
}
}   
Object instance;
if (declaredClass.isPrimitive()) {            // primitive types
if (declaredClass == Boolean.TYPE) {             // boolean
instance = Boolean.valueOf(in.readBoolean());
} else if (declaredClass == Character.TYPE) {    // char
instance = Character.valueOf(in.readChar());
} else if (declaredClass == Byte.TYPE) {         // byte
instance = Byte.valueOf(in.readByte());
} else if (declaredClass == Short.TYPE) {        // short
instance = Short.valueOf(in.readShort());
} else if (declaredClass == Integer.TYPE) {      // int
instance = Integer.valueOf(in.readInt());
} else if (declaredClass == Long.TYPE) {         // long
instance = Long.valueOf(in.readLong());
} else if (declaredClass == Float.TYPE) {        // float
instance = Float.valueOf(in.readFloat());
} else if (declaredClass == Double.TYPE) {       // double
instance = Double.valueOf(in.readDouble());
} else if (declaredClass == Void.TYPE) {         // void
instance = null;
} else {
throw new IllegalArgumentException("Not a primitive: "+declaredClass);
}
} else if (declaredClass.isArray()) {              // array
int length = in.readInt();
instance = Array.newInstance(declaredClass.getComponentType(), length);
for (int i = 0; i < length; i++) {
Array.set(instance, i, readObject(in, conf));
}
} else if (declaredClass == String.class) {        // String
instance = UTF8.readString(in);
} else if (declaredClass.isEnum()) {         // enum
instance = Enum.valueOf((Class<? extends Enum>) declaredClass, UTF8.readString(in));
} else {                                      // Writable
Class instanceClass = null;
String str = "";
try {
str = UTF8.readString(in);
instanceClass = conf.getClassByName(str);
} catch (ClassNotFoundException e) {
throw new RuntimeException("readObject can't find class " + str, e);
}
//利用instanceClass创建WritableFactories
Writable writable = WritableFactories.newInstance(instanceClass, conf);
writable.readFields(in);
instance = writable;
if (instanceClass == NullInstance.class) {  // null
declaredClass = ((NullInstance)instance).declaredClass;
instance = null;
}
}
if (objectWritable != null) {                 // store values
objectWritable.declaredClass = declaredClass;
objectWritable.instance = instance;
}
return instance;
  [size=1.1em] 

//保存了类型和WritableFactories工厂的对应关系
private static final HashMap<Class, WritableFactory> CLASS_TO_FACTORY =
new HashMap<Class, WritableFactory>();
/** Create a new instance of a class with a defined factory. */
public static Writable newInstance(Class<? extends Writable> c, Configuration conf) {
WritableFactory factory = WritableFactories.getFactory(c);
if (factory != null) {
Writable result = factory.newInstance();
if (result instanceof Configurable) {
((Configurable) result).setConf(conf);
}
return result;
} else {
//采用传统的反射工具,创建对象
return ReflectionUtils.newInstance(c, conf);
}
}
  [size=1.1em] 

运维网声明 1、欢迎大家加入本站运维交流群:群②:261659950 群⑤:202807635 群⑦870801961 群⑧679858003
2、本站所有主题由该帖子作者发表,该帖子作者与运维网享有帖子相关版权
3、所有作品的著作权均归原作者享有,请您和我们一样尊重他人的著作权等合法权益。如果您对作品感到满意,请购买正版
4、禁止制作、复制、发布和传播具有反动、淫秽、色情、暴力、凶杀等内容的信息,一经发现立即删除。若您因此触犯法律,一切后果自负,我们对此不承担任何责任
5、所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其内容的准确性、可靠性、正当性、安全性、合法性等负责,亦不承担任何法律责任
6、所有作品仅供您个人学习、研究或欣赏,不得用于商业或者其他用途,否则,一切后果均由您自己承担,我们对此不承担任何法律责任
7、如涉及侵犯版权等问题,请您及时通知我们,我们将立即采取措施予以解决
8、联系人Email:admin@iyunv.com 网址:www.yunweiku.com

所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其承担任何法律责任,如涉及侵犯版权等问题,请您及时通知我们,我们将立即处理,联系人Email:kefu@iyunv.com,QQ:1061981298 本贴地址:https://www.iyunv.com/thread-310581-1-1.html 上篇帖子: hadoop源码-包结构 下篇帖子: hadoop深入学习之SequenceFile
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

扫码加入运维网微信交流群X

扫码加入运维网微信交流群

扫描二维码加入运维网微信交流群,最新一手资源尽在官方微信交流群!快快加入我们吧...

扫描微信二维码查看详情

客服E-mail:kefu@iyunv.com 客服QQ:1061981298


QQ群⑦:运维网交流群⑦ QQ群⑧:运维网交流群⑧ k8s群:运维网kubernetes交流群


提醒:禁止发布任何违反国家法律、法规的言论与图片等内容;本站内容均来自个人观点与网络等信息,非本站认同之观点.


本站大部分资源是网友从网上搜集分享而来,其版权均归原作者及其网站所有,我们尊重他人的合法权益,如有内容侵犯您的合法权益,请及时与我们联系进行核实删除!



合作伙伴: 青云cloud

快速回复 返回顶部 返回列表