This page lists known issues with Hive 3 on MR3. Note that here we do not report known bugs in Apache Hive 3, whether fixed in Apache Hive 4 or not. For asking questions on MR3, please contact DataMonad or visit MR3 Google Group.

1. LOG_LEVEL in kubernetes/env.sh

Currently the environment variable LOG_LEVEL is not used.

LOG_LEVEL=INFO

To change the logging level for Metastore and HiveServer2, update kubernetes/conf/hive-log4j2.properties.

2. Invalid cache of DynamicValue

If the user switches to another database and executes the same query again that has been executed previously, a wrong result may be returned because Hive internally uses the cache of DynamicValue populated with the previous database. A practical workaround is to delete all running ContainerWorkers, if the user intends to execute the same query again, after switching to another database. Alternatively the user can set the configuration key hive.io.sarg.cache.max.weight.mb to 0 in hive-site.xml (which is not recommended).

3. GroupByOperator estimating memory usage

When multiple TaskAttempts run inside a DAGAppMaster, GroupByOperator conservatively estimates the size of memory used by a TaskAttempt. As a result, GroupByOperator flushes hash tables more often than necessary. The user can mitigate this issue by increasing the value for the configuration key hive.map.aggr.hash.force.flush.memory.threshold in hive-site.xml.

4. Memory leak in Hive 2 on MR3

HiveServer2 shows memory leak because of its use of an old version of Calcite (1.2.0-incubating and 1.10). See CALCITE-1808 which is fixed in Calcite 1.15.

5. Outer joins failing with NullPointerException (mapjoin_filter_on_outerjoin.q)

Outer joins may fail with NullPointerException if hive.auto.convert.join is set to true.

SELECT * FROM src1
  RIGHT OUTER JOIN src1 src2 ON (src1.key = src2.key AND src1.key < 10 AND src2.key > 10)
  JOIN src src3 ON (src2.key = src3.key AND src3.key < 300)
  SORT BY src1.key, src2.key, src3.key;
Caused by: java.lang.NullPointerException
	at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.getFilterTag(CommonJoinOperator.java:802)
	at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genObject(CommonJoinOperator.java:600)

In such a case, set hive.merge.nway.joins to false.

6. Computing avg() failing with ClassCastException (cbo_rp_gby_empty.q)

Computing avg() over an int column may fail with ClassCastException.

SELECT 'avg' AS key, avg(c_int) AS value FROM cbo_t3;
Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to org.apache.hadoop.hive.ql.exec.vector.DoubleColumnVector
	at org.apache.hadoop.hive.ql.exec.vector.expressions.gen.DoubleColDivideLongColumn.evaluate(DoubleColDivideLongColumn.java:67)

In such a case, set hive.cbo.returnpath.hiveop to false (which is the default value).

7. Query failing with URISyntaxException (cbo_rp_auto_join1.q)

A query may fail with URISyntaxException if it involves merging.

SELECT COUNT(*) FROM
(
  SELECT key, COUNT(*) FROM
  (
    SELECT a.key AS key, a.value AS val1, b.value AS val2 FROM tbl1_n13 a JOIN tbl2_n12 b ON a.key = b.key
  ) subq1
  GROUP BY key
) subq2;
Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: subq2:subq1:amerge.xml
  at org.apache.hadoop.fs.Path.initialize(Path.java:259) ~[hadoop-common-3.1.2.jar:?]
  ...
  at org.apache.hadoop.hive.ql.exec.Utilities.getPlanPath(Utilities.java:639) ~[hive-exec-3.1.3.jar:3.1.3]
  at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:415) ~[hive-exec-3.1.3.jar:3.1.3]
  at org.apache.hadoop.hive.ql.exec.Utilities.getMergeWork(Utilities.java:379) ~[hive-exec-3.1.3.jar:3.1.3]

In such a case, set hive.cbo.returnpath.hiveop to false (which is the default value). (Setting hive.rpc.query.plan to true may not help.)

8. Windowing and analytic functions (vector_ptf_part_simple.q)

For windowing and analytic functions, the result may not be the same as in Hive on Tez or Hive-LLAP if ORDER BY is not used in the OVER clause. This is not a bug because the result depends on partitioning for a particular column.

SELECT 
  row_number() OVER(PARTITION BY p_mfgr) AS rn,
  row_number() OVER(PARTITION BY p_mfgr RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS rn,
  row_number() OVER(PARTITION BY p_mfgr ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS rn,
  sum(p_retailprice) OVER(PARTITION BY p_mfgr ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS s,
...

9. Inserting into ORC tables failing with NullPointerException (orc_merge10.q)

When both hive.llap.io.enabled and hive.merge.tezfiles are set to true, inserting into partitioned ORC tables may fail with NullPointerException.

CREATE TABLE orcfile_merge1b_n1 (key int, value string)
  PARTITIONED BY (ds string, part string) STORED AS orc;

INSERT OVERWRITE TABLE orcfile_merge1b_n1 PARTITION (ds='1', part)
  SELECT key, value, pmod(hash(key), 2) AS part
  FROM src;
Caused by: java.lang.NullPointerException
	at org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatchCtx.addPartitionColsToBatch(VectorizedRowBatchCtx.java:560)
	at org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatchCtx.addPartitionColsToBatch(VectorizedRowBatchCtx.java:388)
	at org.apache.hadoop.hive.llap.io.api.impl.LlapRecordReader.next(LlapRecordReader.java:317)

In such a case, set hive.merge.orcfile.stripe.level to true (which is the default value).

10. load data producing wrong results (mm_loaddata.q)

If hive.llap.io.enabled is set to true, setting tez.grouping.min-size to too small a value (e.g., 1) may produce wrong results.

CREATE TABLE load0_mm (key string, value string) STORED AS textfile TBLPROPERTIES("transactional"="true", "transactional_properties"="insert_only");
LOAD DATA LOCAL INPATH 'data/files/kv2.txt' INTO TABLE load0_mm;

In practice, tez.grouping.min-size is usually set to a large value (e.g., the default value of 50 * 1024 * 1024 = 52428800), so this is not a problem.

11. Query failing with IllegalStateException (cbo_subq_exists.q)

A query may fail with IllegalStateException with a message Must start input before invoking this method.

SELECT * FROM src_cbo b
GROUP BY key, value
HAVING NOT EXISTS
  (SELECT a.key
  FROM src_cbo a
  WHERE b.value = a.value AND a.key = b.key AND a.value > 'val_12');
Caused by: java.lang.IllegalStateException: Must start input before invoking this method
	at org.apache.tez.common.Preconditions.checkState(Preconditions.java:57)
	at org.apache.tez.runtime.library.input.OrderedGroupedKVInput.waitForInputReady(OrderedGroupedKVInput.java:182)
	at org.apache.tez.runtime.library.input.OrderedGroupedKVInput.getReader(OrderedGroupedKVInput.java:260)
  ...

In such a case, set hive.cbo.enable to false or mr3.container.runtime.auto.start.input to true.