unmatch_oppon_pro是一张Hive表,现在需要将其导入到HBase中。
Hive中的unmatch_oppon_pro的表结构如下:
| 字段 | 类型 | 
|---|---|
| id | bigint | 
| site_id | int | 
| product_code | string | 
| product_name | string | 
| product_url | string | 
| update_time | string | 
| product_price | double | 
| appraisal_num | int | 
| sold_num | int | 
导入到HBase时,将 product_code 作为HBase表的 row_key
利用Hive和HBase整合,可以实现通过Hive,读写HBase表,详细可以参考HBaseIntegration
代码如下:
create table unmatch_oppon_pro_hbase 
(
    row_key         string,
    id              bigint,
    site_id         int,
    product_code    string,
    product_name    string,
    product_url     string,
    update_time     string,
    product_price   double,
    appraisal_num   int,
    sold_num        int
) STORED BY "org.apache.hadoop.hive.hbase.HBaseStorageHandler" WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,info:id,info:site_id,info:product_code,info:product_name,info:product_url,info:update_time,info:product_price,info:appraisal_num,info:sold_num") TBLPROPERTIES ("hbase.table.name"="unmatch_oppon_pro");
insert overwrite table unmatch_oppon_pro_hbase 
select 
    product_code as row_key,
    id,
    site_id,
    product_code,
    product_name,
    product_url,
    update_time,
    product_price,
    appraisal_num,
    sold_num 
from pms.unmatch_oppon_pro;
上面的代码是建立Hive内部表,为了让这张Hive内部表可以读写HBase表,需要如下操作:
STORED BY "org.apache.hadoop.hive.hbase.HBaseStorageHandler"
"hbase.columns.mapping" = ":key,info:id,info:site_id,info:product_code,info:product_name,info:product_url,info:update_time,info:product_price,info:appraisal_num,info:sold_num
其中":key"表示HBase的row_key,对应Hive表的第一个字段row_key string;
"info:id,info:site_id,info:product_code,info:product_name,info:product_url,info:update_time,info:product_price,info:appraisal_num,info:sold_num"
表示HBase的列族和列,其中info是HBase表的列族,id、site_id、product_code等作为info的列
"hbase.table.name"="unmatch_oppon_pro",这个如果不指定的话,最终生成的HBase表与Hive表同名
执行以后结果如下
hbase(main):001:0> describe ‘unmatch_oppon_pro‘
DESCRIPTION         ENABLED
‘unmatch_oppon_pro‘, {NAME => ‘info‘, DATA_BLOCK_ENCODING => ‘NONE‘, 
BLOOMFILTER => ‘ROW‘, REPLICATION_SCOPE => ‘0‘, VERSIONS =>  true
‘1‘, COMPRESSION => ‘NONE‘, MIN_VERSIONS => ‘0‘, TTL => ‘FOREVER‘, 
KEEP_DELETED_CELLS => ‘false‘, BLOCKSIZE => ‘65536‘, 
IN_MEMORY=> ‘false‘, BLOCKCACHE => ‘true‘}
1 row(s) in 2.4440 seconds
hbase(main):002:0> scan ‘unmatch_oppon_pro‘,{LIMIT=>1} 
ROW         COLUMN+CELL
1000001232  column=info:appraisal_num, timestamp=1432528332998, value=4
1000001232  column=info:id, timestamp=1432528332998, value=112932511
1000001232  column=info:product_code, timestamp=1432528332998, value=1000001232
1000001232  column=info:product_name, timestamp=1432528332998, value=\xE4\xB8\x80\xE7
1000001232  column=info:product_price, timestamp=1432528332998, value=318.0
1000001232  column=info:product_url, timestamp=1432528332998, value=http://item.jd.com/1000001232.html
1000001232  column=info:site_id, timestamp=1432528332998, value=1001
1000001232  column=info:update_time, timestamp=1432528332998, value=2015-05-22 01:58:57.0
1 row(s) in 0.1530 seconds
代码如下:
create external table unmatch_oppon_pro_hbase 
(
    row_key         string,
    id              bigint,
    site_id         int,
    product_code    string,
    product_name    string,
    product_url     string,
    update_time     string,
    product_price   double,
    appraisal_num   int,
    sold_num        int
) STORED BY "org.apache.hadoop.hive.hbase.HBaseStorageHandler" WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,info:id,info:site_id,info:product_code,info:product_name,info:product_url,info:update_time,info:product_price,info:appraisal_num,info:sold_num") TBLPROPERTIES ("hbase.table.name"="unmatch_oppon_pro");
insert overwrite table unmatch_oppon_pro_hbase 
select 
    product_code as row_key,
    id,
    site_id,
    product_code,
    product_name,
    product_url,
    update_time,
    product_price,
    appraisal_num,
    sold_num 
from pms.unmatch_oppon_pro;
实际上,增量更新与全量覆盖更新的区别,在于增量更新建立的是Hive的外表
区别如下
[Hive]HBaseIntegration:通过Hive读写HBase
原文:http://blog.csdn.net/yeweiouyang/article/details/46003587