在完成 Hive安装配置 后自然而然就是它的基本应用,本文就基本操作进行一些演示:
- DDL操作
- DML操作
ps:Hive 依赖于Hadoop,当然在做基本操作之前确保Hadoop的相关应用启动正常。
[一]、DDL操作
1、创建表
比如创建一个简单的两个字段的表的命令如下:
1 |
hive> CREATE TABLE pokes (foo INT, bar STRING); |
再比如 创建两个基础字段和一个分区字段的表的命令如下:
1 |
hive> CREATE TABLE invites (foo INT, bar STRING) PARTITIONED BY (ds STRING); |
2、查询表信息
查询出所有表的命令:
1 |
hive> SHOW TABLES; |
查询符合条件表的命令:
1 |
hive> SHOW TABLES '.*s'; |
查询某个表的详细信息:
1 |
hive> DESCRIBE table_name; |
执行过程如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
hive> show tables; OK Time taken: 0.011 seconds hive> CREATE TABLE pokes (foo INT, bar STRING); OK Time taken: 0.028 seconds hive> CREATE TABLE invites (foo INT, bar STRING) PARTITIONED BY (ds STRING); OK Time taken: 0.039 seconds hive> show tables; OK invites pokes Time taken: 0.014 seconds, Fetched: 2 row(s) hive> SHOW TABLES '.*s'; OK invites pokes Time taken: 0.031 seconds, Fetched: 2 row(s) hive> DESC invites; OK foo int None bar string None ds string None # Partition Information # col_name data_type comment ds string None Time taken: 0.049 seconds, Fetched: 8 row(s) hive> |
ps:默认创建的表字段分隔符为:\001(ctrl-A),当然也可以在创建表的时候自定义分隔符(只支持一个字符比如 制表符 ‘\t’),有关多字符的分隔符的实现,需要自定义 outputformat 和 inputformat(这个本文暂不涉及以后有时间再单独介绍)。
自定义分隔符创建表的命令示例:
1 |
hive> CREATE TABLE myword (word STRING, counts INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'; |
更多DDL信息参考:Hive Data Definition Language
[二]、DML操作
1、准备测试的文本数据
本地创建一个myword.txt 文件,字段之间用制表符(’\t’)分隔:
1 2 3 4 5 6 7 8 |
micmiu.com 10 MichaelSun 3 ctosun.com 2 sjsky 8 java 9 hadoop 7 hive 6 scala 1 |
2、load数据
1 2 3 4 5 6 7 8 |
hive> LOAD DATA LOCAL INPATH '/Users/micmiu/no_sync/testdata/hadoop/myword.txt' OVERWRITE INTO TABLE myword; Copying data from file:/Users/micmiu/no_sync/testdata/hadoop/myword.txt Copying file: file:/Users/micmiu/no_sync/testdata/hadoop/myword.txt Loading data to table default.myword Table default.myword stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 78, raw_data_size: 0] OK Time taken: 0.182 seconds hive> |
测试查询导入的数据:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
hive> select t.word from myword t; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1392094262454_0006, Tracking URL = http://micmiu-mbp.local:8088/proxy/application_1392094262454_0006/ Kill Command = /usr/local/share/hadoop/bin/hadoop job -kill job_1392094262454_0006 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2014-02-11 15:53:15,274 Stage-1 map = 0%, reduce = 0% 2014-02-11 15:53:21,489 Stage-1 map = 100%, reduce = 0% Ended Job = job_1392094262454_0006 MapReduce Jobs Launched: Job 0: Map: 1 HDFS Read: 291 HDFS Write: 62 SUCCESS Total MapReduce CPU Time Spent: 0 msec OK micmiu.com MichaelSun ctosun.com sjsky java hadoop hive scala Time taken: 22.684 seconds, Fetched: 8 row(s) hive> |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
hive> select t.counts from myword t; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1392094262454_0007, Tracking URL = http://micmiu-mbp.local:8088/proxy/application_1392094262454_0007/ Kill Command = /usr/local/share/hadoop/bin/hadoop job -kill job_1392094262454_0007 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2014-02-11 15:54:41,395 Stage-1 map = 0%, reduce = 0% 2014-02-11 15:54:46,585 Stage-1 map = 100%, reduce = 0% Ended Job = job_1392094262454_0007 MapReduce Jobs Launched: Job 0: Map: 1 HDFS Read: 291 HDFS Write: 17 SUCCESS Total MapReduce CPU Time Spent: 0 msec OK 10 3 2 8 9 7 6 1 Time taken: 21.657 seconds, Fetched: 8 row(s) hive> |
ORDER BY 演示:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
hive> select t.word,t.counts from myword t ORDER BY t.counts; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapred.reduce.tasks=<number> Starting Job = job_1392094262454_0008, Tracking URL = http://micmiu-mbp.local:8088/proxy/application_1392094262454_0008/ Kill Command = /usr/local/share/hadoop/bin/hadoop job -kill job_1392094262454_0008 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2014-02-11 16:05:15,960 Stage-1 map = 0%, reduce = 0% 2014-02-11 16:05:21,146 Stage-1 map = 100%, reduce = 0% 2014-02-11 16:05:27,366 Stage-1 map = 100%, reduce = 100% Ended Job = job_1392094262454_0008 MapReduce Jobs Launched: Job 0: Map: 1 Reduce: 1 HDFS Read: 291 HDFS Write: 79 SUCCESS Total MapReduce CPU Time Spent: 0 msec OK scala 1 ctosun.com 2 MichaelSun 3 hive 6 hadoop 7 sjsky 8 java 9 micmiu.com 10 Time taken: 26.95 seconds, Fetched: 8 row(s) hive> |
对比文本文件结果一致。
更多DML信息参考:Hive Data Manipulation Language
—————– EOF @Michael Sun —————–
原创文章,转载请注明: 转载自micmiu – 软件开发+生活点滴[ http://www.micmiu.com/ ]
本文链接地址: http://www.micmiu.com/bigdata/hive/hive-tutorial-ddl-dml/
0 条评论。