阿杜&eason

wind-cold 发表于 2017-12-16 20:35:13

　　1、配置
　　1）解压到/opt/moduels
　　2）配置HIVE_HOME
　　3)配置HADOOP_HOME和HIVE_CONF_DIR到hive-env.sh
　　4)在HDFS文件系统上创建HIVE元数据存储目录并赋予权限
　　5）bin/hive　　-->使用sql语句
　　2、安装MySQL并配置
　　1）unzip mysql
　　2)rpm -e --nodeps mysql
　　3)rpm -ivh mysql-server
　　4)cat /root/.mysql_secret
　　5)rpm -ivh mysql-client
　　6)mysql -uroot -p
　　7)set password=password('123456');
　　8)update user set Host='%'-> where User='root' and Host = 'localhost';
　　　　9）flush privileges;
　　　　10）tar -zxvf mysql-connector
　　　　11) cp mysql-connector-java /opt/moduels/hive/lib
　　　　12)配置hive-site.xml中URL、DrvierName、UserName、Password　　　　-->端口号3306，DataBase=metastore
　　　　-->https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin
　　3、hive基本操作
　　1)列分隔符
　　ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
　　2)加载本地数据
　　load data local inpath '/opt/datas/student.txt' (overwrite) into table student;
　　3）desc formated(extended) student;
　　4)show functions;　　-->desc function(extended) substring;
　　5)数据的清除　　truncate table table_name ;
　　4、一些配置
　　1）配置client.header和client.currentdb来显示当前数据库
　　2）日志文件配置
　　3）set;　　-->查看配置信息　　-->set hive.root.logger=INFO,console;设置日志信息打印在控制台
　　4）常用交互式命令 bin/hive -help(-i,-f,-e)
　　5、创建表的三种方式
　　1）create table test01(ip string comment '...',user string)
　　comment 'access log'
　　row format delimited fields terminated by ' '
　　stored as textfile
　　location '/user/hive/warehouse/logs'
　　2)create table test02
　　as select ip,user from test01;
　　3)create table test03
　　like test01;
　　6、Hive的数据类型
　　1）table ,load　　　E
　　2)select,python　　T
　　3)sub table　　　　L
　　7、Hive中表的类型
　　1）管理表　　　　-->　　删除表时，会删除表数据以及元数据
　　2）托管表（外部表，external）　　-->　　删除表时，只会删除元数据而不会删除表数据
　　3）分区表（partitioned tables）　-->　　查询时可以通过where子句来指定分区　
　　create table dept_partition(deptno int,dname string,loc string)
　　partitioned by(event_month string[,event_day string])　　-->二级分区
　　row format delimited fields terminated by '\t';
　　加载数据：
　　load data local inpath '/opt/datas/emp.txt' into table emp_partition partition (mouth='201509');
　　查询：
　　where mouth = '201509';
　　注意事项：
　　a.自己手动创建分区表文件夹并put数据，并没有将分区元数据写入元数据库，所以无法读取数据，可以手动修复：
　　msck repair table dept_partition;
　　或者　　alter table dept_part add partition(day='20150913');
　　b.查看表的分区数：show partitions dept_partition;
　　8、导出表的方式
　　1）insert overwrite local directory '/opt/datas/hive_exp_emp'　　-->去掉local导出到Hdfs文件系统上
　　row format delimited fields terminated by '\t'
　　collection items terminated by '\n'
　　select * from db_hive.emp;
　　2)bin/hive -e "select * from db_hive.emp" > /opt/datas/hive_exp_exp.txt　　-->没有跑MapReduce任务
　　3）scoop　　hdfs/hive->rdbms　　or　　rdbms->hdfs/hive/hbase
　　9、Hive中常见的查询　
(Note: Only available starting with Hive 0.13.0)SELECT select_expr, select_expr, ...　　-->全部和查重FROM table_reference ] 　　1）select t.empno,t.ename,t.deptno from emp t;　　-->t
　　2)between　　-->
　　select t.empno,t.ename,t.deptno from emp t wheret.sal between 800 and 1500;
　　3)is null/is not null
　　select t.empno,t.ename,t.deptno from emp t where comm is null;
　　4)group by/having　
　　查询每个部门的平均工资
　　select avg(sal) avg_sal from emp
　　group by deptno;
　　查询每个部门中每个岗位的最高薪水
　　select t.deptno,t.job,max(t.sal) max_sal from emp t group by t.deptno,t.job;　　-->双重分组
　　having与where区别
　　where 针对单条记录进行筛选
　　having针对分组结果进行筛选　
　　10、Export/Import
　　1)Export　　-->将Hive表中的数据导出
　　　　EXPORT TABLE tablename )] 　　　　 TO 'export_target_path' [ FOR replication('eventid') ]　　　　-->path指的为HDFS上的路径　　2）Import　　　　　　　IMPORT [ TABLE new_or_original_tablename )]] 　　　　 FROM 'source_path' 　　　　 11、sort　　1）order by　　-->全局排序，一个Reduce　　2)sort by　　-->每个reduce内部进行排序，全局不是排序　　3)distribute by　　-->类似MR中的partition，进行分区，结合sort by使用　　　　insert overwrite local directory '/opt/datas/dist_emp' 　　　　select * from emp 　　　　distribute by deptno　　　　sort by empno asc;　　4)cluster by　　-->当distribute 和sort字段相同时，使用此方式12、Hive自带function及udf编程　　-->user defination function　　1）https://cwiki.apache.org/confluence/display/Hive/HivePlugins　　2)编程步骤：　　　　a.继承org.apache.hadoop.hive.ql.UDF　　　　b.需要实现evaluate函数　　　　ps.必须有返回类型，常用Text/LongWritable等类型，不推荐使用java类型　　3）配置pom.xml文件中的hive-jdbc和hive-exec　　4)使用：　　add jar /opt/datas/udf-tolower.jar;　　　　　　　　create temporary function my_lower as "com.cnblog.hive.udf.LowerUDF";　　-->类名　　　　　　或者 create function myfunc as 'myclass' using jar 'hdfs://hostname/path/to/jar';　　-->文件必须在hdfs文件系统上

页: [1]

运维网's Archiver

阿杜&eason