昨天同事在线上发现一个hivequery不能跑,SQL类似如下:
select c from (select * from db1.t1 where partkey1 = ‘xxx‘) a left outer join db2.t2 b on (a.c1 = b.c1) left outer join db2.t2 c on (a.c2 = c.c2) group by c
报NPE,尝试下发现连explain都不行,说明在解析阶段就出问题了,查看日志:
atorg.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer$MetadataOnlyTaskDispatcher.dispatch(MetadataOnlyOptimizer.java:308) atorg.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:87) atorg.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:124) atorg.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:101) atorg.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer.resolve(MetadataOnlyOptimizer.java:175) atorg.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:79) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genMapRedTasks(SemanticAnalyzer.java:8399) atorg.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8741) atorg.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:278) atorg.apache.hadoop.hive.ql.Driver.compile(Driver.java:433) atorg.apache.hadoop.hive.ql.Driver.compile(Driver.java:337) atorg.apache.hadoop.hive.ql.Driver.run(Driver.java:902) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259) atorg.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) atorg.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:756) atorg.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614) atsun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) atsun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) atsun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) atjava.lang.reflect.Method.invoke(Method.java:597) atorg.apache.hadoop.util.RunJar.main(RunJar.java:208)
是在物理优化阶段MetadataOnlyOptimizer出了问题,具体到NPE的部分是:
while(iterator.hasNext()) { TableScanOperator tso =iterator.next(); LOG.info("Metadata only table scan for "+ tso.getConf().getAlias()); convertToMetadataOnlyQuery((MapredWork)task.getWork(), tso); }
该处是有一个Bug(HIVE-4935)的,按道理直接关闭hive.optimize.metadataonly就可以简单workaround,不过要弄清楚为什么0.11之前的版本不会触发,因为metadataonly 0.10之前就引入了.
查看Pla才知道是和0.11引入的mapjoin merge的优化有关系:
TS1->MapJoin1->TS2->MapJoin2->
由于默认开启了mapjoinmerge,并且都是noconditional的task,TSO是不需要任何column因此conf为空,进而触发了前面的Bug,这个触发本身也作为一个Bug(HIVE-4927)在0.12得到fix,而work around这个Bug也简单:
关闭hive.auto.convert.join.noconditionaltask,因此解决这个问题可以有两种方式来选择.
PS:这个hive.auto.convert.join.noconditionaltask开启后,启动的TASK是被标记为MAPJOIN_ONLY_NOBACKUP:The join task is converted to a mapjoin task. This can only happen if hive.auto.convert.join.noconditionaltaskis set to true.No conditional task was created in case the mapjoin failed.
也就是说这个这个特性启动的task是不会有backup task的,如果某个mapredlocaltask fail的话整个Job也就Failed了.
本文出自 “MIKE老毕的海贼船” 博客,请务必保留此出处http://boylook.blog.51cto.com/7934327/1365451
Hive MetadataOnlyOptimizer NPE,布布扣,bubuko.com
Hive MetadataOnlyOptimizer NPE
原文:http://boylook.blog.51cto.com/7934327/1365451