本文主要描述Hive和HBase 环境整合配置的详细过程:
- 基本环境
- 整合配置
- 测试验证
[一]、基本环境
各设备对应的信息/角色/用途:
hostname | IP | Hadoop用途 | HBase用途 | Hive |
Master.Hadoop | 192.168.6.77 | NameNode/ResouceManager | Master | mysql(metastore) |
Slave5.Hadoop | 192.168.8.205 | DataNode/NodeManager | RegionServer | |
Slave6.Hadoop | 192.168.8.206 | DataNode/NodeManager | RegionServer | |
Slave7.Hadoop | 192.168.8.207 | DataNode/NodeManager | RegionServer |
- Hadoop 2.2.0 集群部署
- Hive-0.13.0 (MetaStore 配置为mysql数据库)
- HBase-0.98.0-hadoop2 集群部署
ps:Hive0.12.0这个版本的发布包 和Hadoop2x、HBase0.98整合有问题,同时我编译Hive0.12.0的源码一直失败,故选择了Hive-branch-0.13.0(官方0.13.0还没有正式发布)编译,然后进行整合测试:
1 2 3 |
svn co http://svn.apache.org/repos/asf/hive/branches/branch-0.13/ hive-branch-0.13 cd hive-branch-0.13/ mvn package -DskipTests -Phadoop-2,dist |
编译成功后在 packaging/target
下找到 apache-hive-0.13.0-bin.tar.gz
[二]、整合配置
Hadoop集群部署、HBase集群部署、Hive安装以及Hive的metastore的配置这里就不再过多描述,之前的文章已经详细介绍过了,这里重点就是Hive和HBase整合的主要配置过程。
ps:自己编译的Hive 0.13.0 直接可以部署成功,无需下面lib复制的过程(这个有点疑惑)。
首先要确保<HIVE_HOME>/lib
下HBase的jar包的版本要和实际环境中HBase的版本一致,需要用<HBASE_HOME>/lib/
目录下得jar包:
1 2 3 4 5 6 |
hbase-client-0.98.0-hadoop2.jar hbase-common-0.98.0-hadoop2-tests.jar hbase-common-0.98.0-hadoop2.jar hbase-protocol-0.98.0-hadoop2.jar hbase-server-0.98.0-hadoop2.jar htrace-core-2.04.jar |
替换掉<HIVE_HOME>/lib/
目录下的jar包:
1 2 |
hbase-0.94.6.1-tests.jar hbase-0.94.6.1.jar |
[三]、测试验证
测试前先依次启动Hadoop、Hbase
1、测试单节点HBase的连接
1 |
hive -hiveconf hbase.master=Master.Hadoop:60000 |
1.1、在Hive中创建HBase关联的表:
1 |
CREATE TABLE hbase_table_1(key int, value string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val") TBLPROPERTIES ("hbase.table.name" = "xyz"); |
Hive shell 执行过程如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
hive> show tables; OK micmiu_blog micmiu_hx_master xflow_dstip Time taken: 0.071 seconds, Fetched: 3 row(s) hive> CREATE TABLE hbase_table_1(key int, value string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val") TBLPROPERTIES ("hbase.table.name" = "xyz"); OK Time taken: 1.059 seconds hive> show tables; OK hbase_table_1 micmiu_blog micmiu_hx_master xflow_dstip Time taken: 0.098 seconds, Fetched: 4 row(s) hive> desc hbase_table_1; OK key int from deserializer value string from deserializer Time taken: 0.112 seconds, Fetched: 2 row(s) hive> |
metastore 配置的mysql中信息:
1 2 3 4 5 6 7 8 9 10 11 |
$ mysql -h 192.168.6.77 -u -p mysql> select TBL_NAME from TBLS ; +------------------+ | TBL_NAME | +------------------+ | hbase_table_1 | | micmiu_blog | | micmiu_hx_master | | xflow_dstip | +------------------+ |
HBase shell 下验证:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
$ hbase shell 2014-04-01 15:30:45,189 INFO [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available HBase Shell; enter 'help<RETURN>' for list of supported commands. Type "exit<RETURN>" to leave the HBase Shell Version 0.98.0-hadoop2, r1565492, Thu Feb 6 16:46:57 PST 2014 hbase(main):001:0> list TABLE test_table xyz 2 row(s) in 1.1630 seconds => ["test_table", "xyz"] hbase(main):002:0> describe "xyz" DESCRIPTION ENABLED 'xyz', {NAME => 'cf1', DATA_BLOCK_ENCODING => 'NO true NE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => ' 0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_V ERSIONS => '0', TTL => '2147483647', KEEP_DELETED _CELLS => 'false', BLOCKSIZE => '65536', IN_MEMOR Y => 'false', BLOCKCACHE => 'true'} 1 row(s) in 0.2190 seconds hbase(main):003:0> |
到此创建关联表验证成功。
1.2、下面演示把数据从Hive中迁移到HBase中:
在Hive CLI中,创建表pokes ,并导入测试数据:
1 2 3 4 5 6 7 |
hive> CREATE TABLE pokes (foo INT, bar STRING); hive> LOAD DATA LOCAL INPATH '/home/hadoop/kv.txt' OVERWRITE INTO TABLE pokes; hive> select * from pokes; OK 100 micmiu.com 101 ctosun.com 102 baby.micmiu.com |
Hive CLI 中向关联表 [ hbase_table_1 ] 中插入数据:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
hive> INSERT INTO TABLE hbase_table_1 SELECT * FROM pokes WHERE foo=100; Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1396231806139_0003, Tracking URL = http://Master.Hadoop:8088/proxy/application_1396231806139_0003/ Kill Command = /usr/local/hadoop/bin/hadoop job -kill job_1396231806139_0003 Hadoop job information for Stage-0: number of mappers: 1; number of reducers: 0 2014-04-04 11:31:31,494 Stage-0 map = 0%, reduce = 0% 2014-04-04 11:31:42,863 Stage-0 map = 100%, reduce = 0%, Cumulative CPU 3.17 sec MapReduce Total cumulative CPU time: 3 seconds 170 msec Ended Job = job_1396231806139_0003 MapReduce Jobs Launched: Job 0: Map: 1 Cumulative CPU: 3.17 sec HDFS Read: 262 HDFS Write: 0 SUCCESS Total MapReduce CPU Time Spent: 3 seconds 170 msec OK Time taken: 28.493 seconds hive> select * from hbase_table_1; OK 100 micmiu.com Time taken: 0.057 seconds, Fetched: 1 row(s) hive> |
HBase shell 下验证插入的数据:
1 2 3 4 |
hbase(main):003:0> scan "xyz" ROW COLUMN+CELL 100 column=cf1:val, timestamp=1396582302516, value=micmiu.com 1 row(s) in 0.0680 seconds |
1.3、下面演示从HBase 端插入测试数据:
1 2 3 4 5 6 7 8 |
hbase(main):004:0> put 'xyz','99','cf1:val','test.micmiu.com' 0 row(s) in 0.0700 seconds hbase(main):005:0> scan "xyz" ROW COLUMN+CELL 100 column=cf1:val, timestamp=1396582302516, value=micmiu.com 99 column=cf1:val, timestamp=1396594065297, value=test.micmiu.com 2 row(s) in 0.0210 seconds |
然后再在Hive中查询验证刚才插入的数据:
1 2 3 4 5 |
hive> select * from hbase_table_1; OK 100 micmiu.com 99 test.micmiu.com Time taken: 0.348 seconds, Fetched: 2 row(s) |
2、测试集群HBase的连接
1 |
hive -hiveconf hbase.zookeeper.quorum=Slave5.Hadoop,Slave6.Hadoop,Slave7.Hadoop |
其他建表、插入数据的测试过程可以参考上面。
参考:
- https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration
- https://cwiki.apache.org/confluence/display/Hive/GettingStarted
- https://cwiki.apache.org/confluence/display/Hive/HiveDeveloperFAQ
—————– EOF @Michael Sun —————–
原创文章,转载请注明: 转载自micmiu – 软件开发+生活点滴[ http://www.micmiu.com/ ]
本文链接地址: http://www.micmiu.com/bigdata/hive/hive-hbase-integration/
太给力了
😛 可以将编译完的hive1.3发到我邮箱不
编译的发布包比较大,你可以到 百度云盘下载:
http://yun.baidu.com/s/1o6K457c#dir/path=%2Fsourcecode%2Fbuilder