This page lists known issues with Hive 3 on MR3. Note that here we do not report known bugs in Apache Hive 3, whether fixed in Apache Hive 4 or not.

1. LOG_LEVEL in kubernetes/env.sh

The environment variable LOG_LEVEL is used only for DAGAppMaster and ContainerWorkers.

LOG_LEVEL=INFO

To change the logging level for Metastore and HiveServer2, update kubernetes/conf/hive-log4j2.properties.

2. Invalid cache of DynamicValue

If the user switches to another database and executes the same query again that has been executed previously, a wrong result may be returned because Hive internally uses the cache of DynamicValue populated with the previous database. A practical workaround is to delete all running ContainerWorkers, if the user intends to execute the same query again, after switching to another database. Alternatively the user can set the configuration key hive.io.sarg.cache.max.weight.mb to 0 in hive-site.xml (which is not recommended).

3. GroupByOperator estimating memory usage

When multiple TaskAttempts run inside a DAGAppMaster in local mode, GroupByOperator conservatively estimates the size of memory used by a TaskAttempt. As a result, GroupByOperator flushes hash tables more often than necessary. The user can mitigate this issue by increasing the value for the configuration key hive.map.aggr.hash.force.flush.memory.threshold in hive-site.xml.

8. Windowing and analytic functions (vector_ptf_part_simple.q)

For windowing and analytic functions, the result may not be the same as in Hive on Tez or Hive-LLAP if ORDER BY is not used in the OVER clause. This is not a bug because the result depends on partitioning for a particular column.

SELECT 
  row_number() OVER(PARTITION BY p_mfgr) AS rn,
  row_number() OVER(PARTITION BY p_mfgr RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS rn,
  row_number() OVER(PARTITION BY p_mfgr ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS rn,
  sum(p_retailprice) OVER(PARTITION BY p_mfgr ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS s,
...