Skip to main content

17 posts tagged with "TPC-DS"

View All Tags

TPC-DS Benchmark: Trino 476, Spark 4.0.0, and Hive 4 on MR3 2.1 (MPP vs MapReduce)

· 9 min read
Sungwoo Park
MR3 Architect and Developer

In our previous article, we evaluated the performance of Trino 468, Spark 4.0.0-RC2, and Hive 4.0.0 on MR3 2.0 using the TPC-DS Benchmark with a scale factor of 10TB.

  • Correctness. Trino returns incorrect results for both subqueries of query 23.
  • Total execution time (Sequential). Trino is the fastest, followed closely by Hive on MR3 (4,442 seconds vs 4,874 seconds). Spark is the slowest, skewed by a few outlier queries (15,678 seconds).
  • Average response time (Sequential). Trino maintains the lead in average response time, with Hive on MR3 again closely behind (17.49 seconds vs 19.76 seconds).
  • Longest execution time (Concurrent). Under concurrent workloads (10, 20, and 40 clients), Hive on MR3 consistently outperforms both Trino and Spark.

TPC-DS Benchmark: Trino 468, Spark 4.0.0-RC2, and Hive 4 on MR3 2.0

· 15 min read
Sungwoo Park
MR3 Architect and Developer

In this article, we evaluate the performance of Trino, Spark, Hive on Tez, and Hive on MR3 using the TPC-DS Benchmark with a scale factor of 10TB.

  1. Trino 468 (released in December 2024)
  2. Spark 4.0.0-RC2 (released in March 2025)
  3. Hive 4.0.0 on Tez (built in February 2025)
  4. Hive 4.0.0 on MR3 2.0 (released in April 2025)

Trino is an MPP-style query engine and is not fault-tolerant. The other three systems are fully fault-tolerant.

Performance Evaluation of Trino and Hive on MR3 using the TPC-DS Benchmark

· 5 min read
Sungwoo Park
MR3 Architect and Developer

Introduction

In our previous article, we evaluate the performance of Trino 418 and Hive on MR3 1.7 using the TPC-DS Benchmark with a scale factor of 10TB.

  • In terms of the total running time, the two systems are comparable: Trino 7424 seconds vs Hive on MR3 7415 seconds.
  • In terms of the geometric mean of running times, Trino is faster than Hive on MR3: Trino 21.75 seconds vs Hive on MR3 27.68 seconds.
  • Trino returns wrong answers on query 23 after running for 1756 seconds.
  • Trino fails to complete query 72 after running for 156 seconds.

Why you should run Hive on Kubernetes, even in a Hadoop cluster

· 9 min read
Sungwoo Park
MR3 Architect and Developer

Hive and Presto

Hive and Presto have developed a tortoise-and-hare story over the past 8 years. Initially conceived at Facebook and open sourced in August 2008, Hive was hailed as a breakthrough in the SQL-on-Hadoop technology and generally regarded as the de facto standard. Then in 2012, Facebook started to develop Presto as a replacement of Hive, which was considered too slow for their daily workload. As Facebook was specific about its goal in developing Presto, the future of Hive did not look so bright.

Hive vs SparkSQL: Hive-LLAP, Hive on MR3, SparkSQL 2.3.2

· 5 min read
Sungwoo Park
MR3 Architect and Developer

Introduction

In our previous article published in October 2018, we use the TPC-DS benchmark to compare the performance of Hive-LLAP and SparkSQL 2.3.1 included in HDP 3.0.1 along with Hive 3.1.0 on MR3 0.4. In this article, we update the result by testing SparkSQL 2.3.2 included in HDP 3.1.4. As in the previous experiment, we use the TPC-DS benchmark.

Presto vs Hive on MR3 (Presto 317 vs Hive on MR3 0.10)

· 8 min read
Sungwoo Park
MR3 Architect and Developer

Introduction

In our previous article, we use the TPC-DS benchmark to compare the performance of three SQL-on-Hadoop systems: Impala 2.12.0+cdh5.15.2+0, Presto 0.217, and Hive 3.1.1 on MR3 0.6. It uses sequential tests to draw the following conclusion:

  • Impala runs faster than Hive on MR3 on short-running queries that take less than 10 seconds.
  • For long-running queries, Hive on MR3 runs slightly faster than Impala.
  • For most queries, Hive on MR3 runs faster than Presto, sometimes an order of magnitude faster.