This page lists known issues with Hive 3 on MR3. Note that here we do not report known bugs in Apache Hive 3, whether fixed in Apache Hive 4 or not.
For asking any questions on MR3, please email us at help@datamonad.com or visit MR3 Google Group.
1. LOG_LEVEL
in kubernetes/env.sh
The environment variable LOG_LEVEL
is used only for DAGAppMaster and ContainerWorkers.
LOG_LEVEL=INFO
To change the logging level for Metastore and HiveServer2,
update kubernetes/conf/hive-log4j2.properties
.
2. Invalid cache of DynamicValue
If the user switches to another database and executes the same query again that has been executed previously,
a wrong result may be returned
because Hive internally uses the cache of DynamicValue
populated with the previous database.
A practical workaround is to delete all running ContainerWorkers,
if the user intends to execute the same query again,
after switching to another database.
Alternatively the user can set the configuration key hive.io.sarg.cache.max.weight.mb
to 0 in hive-site.xml
(which is not recommended).
3. GroupByOperator estimating memory usage
When multiple TaskAttempts run inside a DAGAppMaster in local mode,
GroupByOperator conservatively estimates the size of memory used by a TaskAttempt.
As a result, GroupByOperator flushes hash tables more often than necessary.
The user can mitigate this issue by increasing the value for the configuration key hive.map.aggr.hash.force.flush.memory.threshold
in hive-site.xml
.
5. Outer joins failing with NullPointerException (mapjoin_filter_on_outerjoin.q)
Outer joins may fail with NullPointerException if hive.auto.convert.join
is set to true.
SELECT * FROM src1
RIGHT OUTER JOIN src1 src2 ON (src1.key = src2.key AND src1.key < 10 AND src2.key > 10)
JOIN src src3 ON (src2.key = src3.key AND src3.key < 300)
SORT BY src1.key, src2.key, src3.key;
Caused by: java.lang.NullPointerException
at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.getFilterTag(CommonJoinOperator.java:802)
at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genObject(CommonJoinOperator.java:600)
In such a case, set hive.merge.nway.joins
to false.
8. Windowing and analytic functions (vector_ptf_part_simple.q)
For windowing and analytic functions, the result may not be the same as in Hive on Tez or Hive-LLAP
if ORDER BY
is not used in the OVER
clause.
This is not a bug because the result depends on partitioning for a particular column.
SELECT
row_number() OVER(PARTITION BY p_mfgr) AS rn,
row_number() OVER(PARTITION BY p_mfgr RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS rn,
row_number() OVER(PARTITION BY p_mfgr ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS rn,
sum(p_retailprice) OVER(PARTITION BY p_mfgr ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS s,
...
10. load data
producing wrong results (mm_loaddata.q)
If hive.llap.io.enabled
is set to true,
setting tez.grouping.min-size
to too small a value (e.g., 1) may produce wrong results.
CREATE TABLE load0_mm (key string, value string) STORED AS textfile TBLPROPERTIES("transactional"="true", "transactional_properties"="insert_only");
LOAD DATA LOCAL INPATH 'data/files/kv2.txt' INTO TABLE load0_mm;
In practice,
tez.grouping.min-size
is usually set to a large value (e.g., the default value of 50 * 1024 * 1024 = 52428800),
so this is not a problem.
11. alter table ... concatenate
failing with NullPointerException (list_bucket_dml_8.q)
alter table ... concatenate
may fail with NullPointerException on skewed tables (which use skewed by
), e.g.:
create table list_bucketing_dynamic_part_n2 (key String, value String)
partitioned by (ds String, hr String)
skewed by (key, value) on (('484','val_484'),('51','val_14'),('103','val_103'))
stored as DIRECTORIES
STORED AS RCFILE;