Optimizing Query Compilation in Hive 4 on MR3

October 9, 2024 · 7 min read

MR3 Architect and Developer

Introduction

In our previous article, we evaluated the performance of Hive 4 on MR3 1.11 and Trino 453 on the 10TB TPC-DS benchmark. The results can be summarized as follows:

In terms of the total running time, Hive 4 on MR3 runs slightly faster than Trino -- Hive 4 on MR3 5744 seconds vs Trino 5798 seconds.
In terms of the geometric mean of running times, Trino responds about 15 percent faster than Hive 4 on MR3 -- Trino 17.99 seconds vs Hive 4 on MR3 21.02 seconds.

While there is no clear winner (because Trino returns wrong answers on query 23 and query 72), we observe that Trino runs significantly faster on some queries in the TPC-DS benchmark, as shown in the following table (in seconds):

Query	Hive 4 on MR3	Trino
Query 58	72.943	5.584
Query 67	842.356	325.701
Query 75	224.221	105.102

We set out to investigate the source of the poor performance of Hive 4 on MR3 on such queries. After analyzing query plans generated by Trino and Hive 4 on MR3, we identified three categories of queries for which Trino produced more efficient query plans. For each category, we implemented a patch for generating query plans similarly to Trino and/or fixing the bug hindering the performance. These patches are HIVE-28488, HIVE-28489, and HIVE-28490.

Before presenting experimental results of applying the three patches, we explain each patch in more detail (which is rather technical and can be skipped).

HIVE-28488

When a query applies union operators to 3 or more tables, Hive 4 appends a group-by operator to each union operator, resulting in a query plan that resembles a binary tree. This patch merges adjacent union operators into a single union operator. In addition, it fixes a bug in Hive 4 that inserts a redundant group-by operator. As a result, this patch reduces the number of shuffle stages in a query with multiple union operators. Queries affected by HIVE-28488: query 49 and query 75.

HIVE-28489

A group-by operator with grouping sets often emits an excessive number of records, significantly delaying query execution. On the 10TB TPC-DS benchmark, for example, a single group-by operator in query 67 emits about 40 billion (41,380,289,606) records and the corresponding shuffle stage takes up a large portion of the running time. This patch aims to reduce the number of records emitted by the group-by operator by introducing a preliminary reduce-sink operator that only performs partitioning on the column with the highest number of distinct values. Since inserting an additional shuffle stage can increase the running time, this patch selectively transforms the query plan. For example, it has no effect if the input data of the group-by operator is already partitioned.

Queries affected by HIVE-28489: query 18, query 22, and query 67.

HIVE-28490

We identified a bug in Hive 4 that the method findAscendantWorkOperators() of SharedWorkOptimizer occasionally miscalculates the set of ascendant operators of a given operator. For example, in the following query plan from query 58 of the TPC-DS benchmark, findAscendantWorkOperators() incorrectly includes SEL_140, GBY_141, and RS_142 as ascendant operators of RS_189. As a result, Hive 4 generates a much less efficient query plan than Trino. This patch fixes the bug to ensure that findAscendantWorkOperators() returns a correct set of ascendant operators.

Queries affected by HIVE-28490: query 58 and query 83.

Experiment Setup

To evaluate the performance improvement after applying the three patches, we repeat the experiment from our previous article using the following systems.

Hive 4 on MR3 (basic) -- Hive 4.0.1 on MR3 1.12 without HIVE-28488/28489/28490.
Hive 4 on MR3 (optimized) -- Hive 4.0.1 on MR3 1.12 with HIVE-28488/28489/28490.
Trino 453

We maintain the same configuration and cluster setup from the previous experiment, with the scale factor of the TPC-DS benchmark remaining at 10TB.

Test

We sequentially submit 99 queries from the TPC-DS benchmark. We report the total running time, the geometric mean of running times, and the running time of each individual query.

Raw data of the experiment results

For the reader's perusal, we attach the table containing the raw data of the experiment results. Here is a link to [Google Docs].

Analysis

#1. Queries affected by the patches

The following table shows 7 queries affected by the three patches and their running times (in seconds). No other queries are affected by the patches during query planning.

Patch	Query	Hive 4 on MR3 (basic)	Hive 4 on MR3 (optimized)	Trino
HIVE-28488	query 49	24.402	24.197	12.123
	query 75	240.678	202.841	103.489
HIVE-28489	query 18	37.135	33.289	13.256
	query 22	57.971	15.981	7.584
	query 67	843.129	432.035	306.151
HIVE-28490	query 58	75.691	8.7	5.873
	query 83	20.58	14.328	6.42

hivemr3.1.12.trino.query-fs8.png

With the patches applied to Hive 4 on MR3, we observe moderate to significant improvements in running times. In particular, the reduction in the running time of query 67 (from 866.436 seconds to 432.035 seconds) allows Hive 4 on MR3 to outperform Trino on the entire TPC-DS benchmark, as shown below.

#2. Total running time

The three patches reduce the total running time of Hive 4 on MR3 by about 10 percent. Note that in terms of the total running time, Hive 4 on MR3 (optimized) now runs clearly faster than Trino.

Hive 4 on MR3 (basic) finishes all the queries in 6092 seconds.
Hive 4 on MR3 (optimized) finishes all the queries in 5338 seconds.
Trino finishes all the queries in 5839 seconds.

hivemr3.1.12.trino.total-fs8.png

#3. Response time

In terms of the geometric mean of running times, Trino still responds about 10 percent faster than Hive 4 on MR3.

On average, Hive 4 on MR3 (basic) finishes each query in 22.02 seconds.
On average, Hive 4 on MR3 (optimized) finishes each query in 20.38 seconds.
On average, Trino finishes each query in 18.55 seconds.

hivemr3.1.12.trino.geomean-fs8.png

Conclusion

With the recent release of Hive 4.0.1 and the announcement of the end of support for all the previous versions including Hive 3.1, Apache Hive is reshaping itself by concentrating all development efforts on Hive 4. Correctness bugs are quickly fixed upon discovery, and performance improvements are consistently implemented at all layers. Ease of use, however, is still not a hallmark feature of Apache Hive. For example, installing Apache Hive requires a specific version of Hadoop and does not work on Kubernetes. Furthermore deploying it in LLAP mode is far from trivial, and performance tuning is often a long and painstaking journey (which unfortunately turns away many users). Because of its dependence on Hadoop and the execution engine Tez, it remains unclear if these problems can be resolved in the foreseeable future.

Hive on MR3 addresses these challenges of Apache Hive, providing ease of use comparable to popular solutions such as Trino and Spark. For example, Hive on MR3 can be run in standalone mode without requiring a resource manager like Hadoop or Kubernetes. In addition, it offers a performance level that surpasses Hive with LLAP and continues to evolve with new features, such as support for remote shuffle service (Apache Celeborn).

Introduction​

HIVE-28488​

HIVE-28489​

HIVE-28490​

Experiment Setup​

Test​

Raw data of the experiment results​

Analysis​

#1. Queries affected by the patches​

#2. Total running time​

#3. Response time​

Conclusion​