4、使用 ImportTsv 将 Hive 数据导入 Hbase

时间：2020-06-16 12:21:02 阅读：59 评论：0 收藏：0 [点我收藏+]

Hive 中的数据需要导入到 Hbase 表中，这里采用 Hbase 自带的 ImportTsv。
ImportTsv 导入数据时执行的是 MapReduce 任务，适合大数据导入。

流程如下：

一、将要导入 Hbase 的 Hive 表加载到临时表中，存储文件格式为 TXT，并以 \t 作为分隔符

临时表如下：

CREATE TABLE `student`(
  `s_id` string,
  `s_name` string,
  `s_birth` string,
  `s_sex` string)
ROW FORMAT SERDE
  ‘org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe‘
WITH SERDEPROPERTIES (
  ‘field.delim‘=‘\t‘,
  ‘serialization.null.format‘=‘‘)
STORED AS INPUTFORMAT
  ‘org.apache.hadoop.mapred.TextInputFormat‘
OUTPUTFORMAT
  ‘org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat‘
LOCATION
  ‘hdfs://hbc-cluster/user/hive/warehouse/tmp.db/student‘

二、使用 ImportTsv 生成 Hfile 文件

这个步骤 ImportTsv 会去读取 hive_table_dir 中的文件，并分析 hbase table 的 region 分布，生成对应 region 的 hfile, 放到 hfile_path 中

语法如下：

$ bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns=a,b,c -Dimporttsv.bulk.output=hdfs://storefile-outputdir <tablename> <hdfs-data-inputdir>

必须要指定 HBASE_ROW_KEY，且为第一个字段。

hbase org.apache.hadoop.hbase.mapreduce.ImportTsv     -Dmapreduce.job.queuename=queue     -Dimporttsv.bulk.output=hdfs://hbc-cluster/tmp/hbase     -Dimporttsv.columns="HBASE_ROW_KEY,cf:s_name,cf:s_birth,cf:s_sex"     stream_data_warehouse:student     hdfs://hbc-cluster/user/hive/warehouse/tmp.db/student

三、将上一步的 hfile 导入 Hbase table

这里采用 CompleteBulkLoad

语法如下：

$ bin/hbase org.apache.hadoop.hbase.tool.LoadIncrementalHFiles <hdfs://storefileoutput> <tablename>

hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles     hdfs://hbc-cluster/tmp/hbase/     stream_data_warehouse:student

4、使用 ImportTsv 将 Hive 数据导入 Hbase

原文：https://www.cnblogs.com/xiexiandong/p/13139846.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)