Skip to main content

5 posts tagged with "Hive 4"

View All Tags

TPC-DS Benchmark: Trino 476, Spark 4.0.0, and Hive 4 on MR3 2.1 (MPP vs MapReduce)

· 9 min read
Sungwoo Park
MR3 Architect and Developer

In our previous article, we evaluated the performance of Trino 468, Spark 4.0.0-RC2, and Hive 4.0.0 on MR3 2.0 using the TPC-DS Benchmark with a scale factor of 10TB.

  • Correctness. Trino returns incorrect results for both subqueries of query 23.
  • Total execution time (Sequential). Trino is the fastest, followed closely by Hive on MR3 (4,442 seconds vs 4,874 seconds). Spark is the slowest, skewed by a few outlier queries (15,678 seconds).
  • Average response time (Sequential). Trino maintains the lead in average response time, with Hive on MR3 again closely behind (17.49 seconds vs 19.76 seconds).
  • Longest execution time (Concurrent). Under concurrent workloads (10, 20, and 40 clients), Hive on MR3 consistently outperforms both Trino and Spark.

TPC-DS Benchmark: Trino 468, Spark 4.0.0-RC2, and Hive 4 on MR3 2.0

· 15 min read
Sungwoo Park
MR3 Architect and Developer

In this article, we evaluate the performance of Trino, Spark, Hive on Tez, and Hive on MR3 using the TPC-DS Benchmark with a scale factor of 10TB.

  1. Trino 468 (released in December 2024)
  2. Spark 4.0.0-RC2 (released in March 2025)
  3. Hive 4.0.0 on Tez (built in February 2025)
  4. Hive 4.0.0 on MR3 2.0 (released in April 2025)

Trino is an MPP-style query engine and is not fault-tolerant. The other three systems are fully fault-tolerant.

Optimizing Query Compilation in Hive 4 on MR3

· 7 min read
Sungwoo Park
MR3 Architect and Developer

Introduction

In our previous article, we evaluated the performance of Hive 4 on MR3 1.11 and Trino 453 on the 10TB TPC-DS benchmark. The results can be summarized as follows:

  • In terms of the total running time, Hive 4 on MR3 runs slightly faster than Trino -- Hive 4 on MR3 5744 seconds vs Trino 5798 seconds.
  • In terms of the geometric mean of running times, Trino responds about 15 percent faster than Hive 4 on MR3 -- Trino 17.99 seconds vs Hive 4 on MR3 21.02 seconds.

Hive Performance: Hive-LLAP in HDP 3.1.4 vs Hive 3/4 on MR3 0.10

· 9 min read
Sungwoo Park
MR3 Architect and Developer

Introduction

In our previous article published in October 2018, we use the TPC-DS benchmark to compare the performance of Hive-LLAP in HDP 3.0.1 (as well as HDP 2.6.4) and Hive 3 on MR3 0.4. We have shown that Hive 3 on MR3 yields consistently higher throughput than Hive-LLAP in concurrency tests, but since then, the performance of Hive-LLAP has improved considerably for concurrent queries. Thus we are interested in the question of how Hive on MR3 compares with Hive-LLAP in the latest lease of HDP.