Skip to main content

6 posts tagged with "Hive 4"

View All Tags

TPC-DS Benchmark: Trino 477 and Hive 4 on MR3 2.2

· 6 min read
Sungwoo Park
MR3 Architect and Developer

Since May 2023, we have been evaluating the performance of Trino and Hive on MR3 using the TPC-DS Benchmark. As we ran all experiments in the same cluster and with the same scale factor of 10TB, we can now observe steady performance improvements of both systems over the past two years. In our latest article, we report that Trino maintains the lead in average response time under sequential workloads, while Hive on MR3 consistently outperforms Trino under concurrent workloads.

TPC-DS Benchmark: Trino 476, Spark 4.0.0, and Hive 4 on MR3 2.1 (MPP vs MapReduce)

· 9 min read
Sungwoo Park
MR3 Architect and Developer

In our previous article, we evaluated the performance of Trino 468, Spark 4.0.0-RC2, and Hive 4.0.0 on MR3 2.0 using the TPC-DS Benchmark with a scale factor of 10TB.

  • Correctness. Trino returns incorrect results for both subqueries of query 23.
  • Total execution time (Sequential). Trino is the fastest, followed closely by Hive on MR3 (4,442 seconds vs 4,874 seconds). Spark is the slowest, skewed by a few outlier queries (15,678 seconds).
  • Average response time (Sequential). Trino maintains the lead in average response time, with Hive on MR3 again closely behind (17.49 seconds vs 19.76 seconds).
  • Longest execution time (Concurrent). Under concurrent workloads (10, 20, and 40 clients), Hive on MR3 consistently outperforms both Trino and Spark.

TPC-DS Benchmark: Trino 468, Spark 4.0.0-RC2, and Hive 4 on MR3 2.0

· 15 min read
Sungwoo Park
MR3 Architect and Developer

In this article, we evaluate the performance of Trino, Spark, Hive on Tez, and Hive on MR3 using the TPC-DS Benchmark with a scale factor of 10TB.

  1. Trino 468 (released in December 2024)
  2. Spark 4.0.0-RC2 (released in March 2025)
  3. Hive 4.0.0 on Tez (built in February 2025)
  4. Hive 4.0.0 on MR3 2.0 (released in April 2025)

Trino is an MPP-style query engine and is not fault-tolerant. The other three systems are fully fault-tolerant.

Optimizing Query Compilation in Hive 4 on MR3

· 7 min read
Sungwoo Park
MR3 Architect and Developer

Introduction

In our previous article, we evaluated the performance of Hive 4 on MR3 1.11 and Trino 453 on the 10TB TPC-DS benchmark. The results can be summarized as follows:

  • In terms of the total running time, Hive 4 on MR3 runs slightly faster than Trino -- Hive 4 on MR3 5744 seconds vs Trino 5798 seconds.
  • In terms of the geometric mean of running times, Trino responds about 15 percent faster than Hive 4 on MR3 -- Trino 17.99 seconds vs Hive 4 on MR3 21.02 seconds.

Hive Performance: Hive-LLAP in HDP 3.1.4 vs Hive 3/4 on MR3 0.10

· 9 min read
Sungwoo Park
MR3 Architect and Developer

Introduction

In our previous article published in October 2018, we use the TPC-DS benchmark to compare the performance of Hive-LLAP in HDP 3.0.1 (as well as HDP 2.6.4) and Hive 3 on MR3 0.4. We have shown that Hive 3 on MR3 yields consistently higher throughput than Hive-LLAP in concurrency tests, but since then, the performance of Hive-LLAP has improved considerably for concurrent queries. Thus we are interested in the question of how Hive on MR3 compares with Hive-LLAP in the latest lease of HDP.