首页 > 其他 > 详细

hive:子查询

时间:2014-10-22 10:54:38      阅读:522      评论:0      收藏:0      [点我收藏+]

hive本身支持的子查询非常有限,Hive不支持where子句中的子查询,只允许子查询在from中出现

错误写法:

insert into table branch_atmzc_sum
Select  XT_OP_TRL, SA_TX_DT,"取款-存款",b.cr_tx_amt- a.cr_tx_amt as cr_tx_amt from branch_atmzc a join  branch_atmzc b  on (a.XT_OP_TRL = b.XT_OP_TRL and a.SA_TX_DT = b.SA_TX_DT and a.tran_cd = ‘ATM存款‘ and b.tran_cd = ‘ATM取款‘), counts from branch_atmzc
group by XT_OP_TRL, SA_TX_DT,cr_tx_amt,counts;

正确写法:

insert into table branch_atmzc_sum
Select a.XT_OP_TRL, a.SA_TX_DT,"取款-存款",b.cr_tx_amt- a.cr_tx_amt ,b.counts+a.counts  from branch_atmzc a join  branch_atmzc b  on (a.XT_OP_TRL = b.XT_OP_TRL and a.SA_TX_DT = b.SA_TX_DT and a.tran_cd = ‘ATM存款‘ and b.tran_cd = ‘ATM取款‘)

另外一种:考虑效率

SELECT
	t1.产品类型,
	COUNT(DISTINCT (IF(t2.用户ID IS NULL, NULL, t1.用户ID))) AS KEEP_UV
FROM
(
	SELECT
		产品类型,
		用户ID
	FROM 事实表
	WHERE (`DATE` >= 20140201 AND `DATE` <= 20140228)
) t1
LEFT OUTER JOIN
(
	SELECT
		产品类型,
		用户ID
	FROM 事实表
	WHERE (`DATE` >= 20140101 AND `DATE` <= 20140131)
) t2 ON (t1.产品类型 = t2.产品类型 AND t1.用户ID = t2.用户ID)
GROUP BY t1.产品类型

 本身表包含的字段信息多,时间跨度大。对于这种对于IN / EXISTS子查询(准确地说,这里是非相关子查询)有一种高效的实现,就是LEFT SEMI JOIN:

       LEFT SEMI JOIN implements the uncorrelated IN/EXISTS subquery semantics in an efficient way.

left join 

SELECT
	产品类型,
	COUNT(DISTINCT t1.用户ID) AS KEEP_UV
FROM
(
	SELECT
		产品类型,
		用户ID
	FROM 事实表
	WHERE (`DATE` >= 20140201 AND `DATE` <= 20140228)
) t1
LEFT SEMI JOIN
(
	SELECT
		产品类型,
		用户ID
	FROM 事实表
	WHERE (`DATE` >= 20140101 AND `DATE` <= 20140131)
) t2 ON (t1.产品类型 = t2.产品类型 AND t1.用户ID = t2.用户ID)
GROUP BY 产品类型

  

hive:子查询

原文:http://www.cnblogs.com/kxdblog/p/4042418.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!