Hive表数据同步到es

时间：2019-05-23 14:19:32 阅读：517 评论：0 收藏：0 [点我收藏+]

1.首先服务器节点，进入到对应的数据库。
2. 然后找到要同步的表，show create table + 表名查看一下
或者自己可以新建一个表，用来测试原表，如下

CREATE TABLE `wb_tmp`(                                         
`surface` string,
`radiation` string, 
`loader_id` string)                                                                  
 ROW FORMAT DELIMITED                                                                
FIELDS TERMINATED BY ‘,‘                                                                
STORED AS INPUTFORMAT                                                                   
   ‘org.apache.hadoop.mapred.TextInputFormat‘                             
 OUTPUTFORMAT                                                            
   ‘org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat‘                           
 LOCATION                                                                                  
   ‘hdfs://ffcs/user/projectquene001/publictest/‘ 
 TBLPROPERTIES (                                                                      
   ‘transient_lastDdlTime‘=‘154762891‘);

其中 hdfs 地址可以通过当前数据库其他表结构获取hdfs路径。

如果新建的表没有数据可以采用2种方式加载数据

load data local inpath ‘/projectquene001/wb.txt‘ into table projectquene001.wb_tmp;

这种加载本地文件数据到hive表中，在beeline中识别不到本地路径，可能是beeline的sever多台，所以识别不到，只能用hdfs系统导入，如下

load data inpath ‘/user/projectquene001/publictest/wb.txt‘ into table wb_tmp;

可以本地文件上传至hdfs系统，用 hdfs dfs -put /home/projectquene001/wb.txt /user/projectquene001/publictest

3.先设置引擎同步方式为mr

set hive.execution.engine=mr;

4. 建es关联外表之前，先加载es-hadoop接口包，在hive数据库中执行

add jar hdfs://ffcs/user/feilongv3/public/elasticsearch-hadoop-6.3.2.jar

ps：添加jar包只对当前会话有效，jar包路径可以自己用hdfs - put 命令上传

5.然后我们开始建外表关联es

create external table ES_WB(
surface string,
radiation string,
loader_id string)
STORED BY ‘org.elasticsearch.hadoop.hive.EsStorageHandler‘
TBLPROPERTIES(
‘es.resource‘ = ‘es_mytest/es_mytest‘,
‘es.nodes‘=‘192.168.12.141‘,
‘es.port‘=‘9200‘,
‘es.index.auto.create‘ = ‘true‘,
‘es.index.refresh_interval‘ = ‘-1‘,
‘es.index.number_of_replicas‘ = ‘0‘,
‘es.batch.write.retry.count‘ = ‘6‘,
‘es.batch.write.retry.wait‘ = ‘60s‘);

首先 es.resource xx/yy , 其中xx是索引名称，yy是类型。

注意： es.resource xx/yy 索引名(xx)不能为大写, 因为用大写发现同步的时候报错，如下

技术分享图片

所以用小写比较安全。

6.然后开始同步

insert overwrite table es_wb select surface,radiation,loader_id from wb_tmp;

7.最后结果成功

技术分享图片

Hive表数据同步到es

原文：https://www.cnblogs.com/caiba/p/10911276.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)