21 posts tagged with "Hive"

TPC-DS Benchmark: Trino 476, Spark 4.0.0, and Hive 4 on MR3 2.1 (MPP vs MapReduce)

July 2, 2025 · 9 min read

MR3 Architect and Developer

In our previous article, we evaluated the performance of Trino 468, Spark 4.0.0-RC2, and Hive 4.0.0 on MR3 2.0 using the TPC-DS Benchmark with a scale factor of 10TB.

Correctness. Trino returns incorrect results for both subqueries of query 23.
Total execution time (Sequential). Trino is the fastest, followed closely by Hive on MR3 (4,442 seconds vs 4,874 seconds). Spark is the slowest, skewed by a few outlier queries (15,678 seconds).
Average response time (Sequential). Trino maintains the lead in average response time, with Hive on MR3 again closely behind (17.49 seconds vs 19.76 seconds).
Longest execution time (Concurrent). Under concurrent workloads (10, 20, and 40 clients), Hive on MR3 consistently outperforms both Trino and Spark.

TPC-DS Benchmark: Trino 468, Spark 4.0.0-RC2, and Hive 4 on MR3 2.0

April 21, 2025 · 15 min read

Sungwoo Park

MR3 Architect and Developer

In this article, we evaluate the performance of Trino, Spark, Hive on Tez, and Hive on MR3 using the TPC-DS Benchmark with a scale factor of 10TB.

Trino 468 (released in December 2024)
Spark 4.0.0-RC2 (released in March 2025)
Hive 4.0.0 on Tez (built in February 2025)
Hive 4.0.0 on MR3 2.0 (released in April 2025)

Trino is an MPP-style query engine and is not fault-tolerant. The other three systems are fully fault-tolerant.

Optimizing Query Compilation in Hive 4 on MR3

October 9, 2024 · 7 min read

Sungwoo Park

MR3 Architect and Developer

Introduction

In our previous article, we evaluated the performance of Hive 4 on MR3 1.11 and Trino 453 on the 10TB TPC-DS benchmark. The results can be summarized as follows:

In terms of the total running time, Hive 4 on MR3 runs slightly faster than Trino -- Hive 4 on MR3 5744 seconds vs Trino 5798 seconds.
In terms of the geometric mean of running times, Trino responds about 15 percent faster than Hive 4 on MR3 -- Trino 17.99 seconds vs Hive 4 on MR3 21.02 seconds.

Performance Evaluation of Hive 4 on MR3 and Trino using the TPC-DS Benchmark

August 1, 2024 · 5 min read

Sungwoo Park

MR3 Architect and Developer

Introduction

Recently Apache Hive 4 was released after a hiatus of several years. We have released Hive 4 on MR3 which replaces Tez with MR3 as the default execution engine in Hive 4.

Performance Evaluation of Trino and Hive on MR3 using the TPC-DS Benchmark

January 7, 2024 · 5 min read

Sungwoo Park

MR3 Architect and Developer

Introduction

In our previous article, we evaluate the performance of Trino 418 and Hive on MR3 1.7 using the TPC-DS Benchmark with a scale factor of 10TB.

In terms of the total running time, the two systems are comparable: Trino 7424 seconds vs Hive on MR3 7415 seconds.
In terms of the geometric mean of running times, Trino is faster than Hive on MR3: Trino 21.75 seconds vs Hive on MR3 27.68 seconds.
Trino returns wrong answers on query 23 after running for 1756 seconds.
Trino fails to complete query 72 after running for 156 seconds.

Performance Tuning for Single-table Queries

December 23, 2023 · 5 min read

Sungwoo Park

MR3 Architect and Developer

Introduction

In our previous article, we have shown that Hive on MR3 1.7 runs much faster than Spark 3.4.0 on the TPC-DS benchmark with a scale factor of 10TB (7415 seconds vs 19669 seconds). The performance gap is expected to widen further due to improvements in Hive on MR3 1.8 (6867 seconds vs 7415 seconds). Still, however, there is a category of queries on which Hive on MR3 seems noticeably slower than Spark: single-table queries with no joins.

Hive on MR3 - from Java 8 to Java 17 (and beating Trino)

December 9, 2023 · 4 min read

Sungwoo Park

MR3 Architect and Developer

Introduction

Before MR3 1.8, Hive on MR3 was built with Java 8. From MR3 1.8, we release Hive on MR3 built with Java 17 as well. An immediate benefit of upgrading to Java 17 is a significant improvement in speed and stability.

Performance Evaluation of Trino, Spark, and Hive on MR3

May 31, 2023 · 6 min read

Sungwoo Park

MR3 Architect and Developer

Introduction

In this article, we evaluate the performance of the following systems.

Trino 418 (released on May 17, 2023)
Spark 3.4.0 (released on Apr 13, 2023)
Hive 3.1.3 on MR3 1.7 (released on May 15, 2023)

Performance Evaluation of Spark 2, Spark 3, Hive-LLAP, and Hive on MR3

April 1, 2022 · 8 min read

Sungwoo Park

MR3 Architect and Developer

Introduction

In this article, we evaluate the performance of the following systems.

Spark 2.3.8
Spark 3.2.1
Hive 3.1.2 on MR3 1.4
Hive-LLAP in HDP 3.1.4 (3.1.0.3.1.4.0-315)

Why you should run Hive on Kubernetes, even in a Hadoop cluster

July 19, 2020 · 9 min read

Sungwoo Park

MR3 Architect and Developer

Hive and Presto

Hive and Presto have developed a tortoise-and-hare story over the past 8 years. Initially conceived at Facebook and open sourced in August 2008, Hive was hailed as a breakthrough in the SQL-on-Hadoop technology and generally regarded as the de facto standard. Then in 2012, Facebook started to develop Presto as a replacement of Hive, which was considered too slow for their daily workload. As Facebook was specific about its goal in developing Presto, the future of Hive did not look so bright.

Introduction​

Introduction​

Introduction​

Introduction​

Introduction​

Introduction​

Introduction​

Hive and Presto​

Introduction

Introduction

Introduction

Introduction

Introduction

Introduction

Introduction

Hive and Presto