Skip to main content

9 posts tagged with "Hive 3"

View All Tags

Performance Evaluation of Trino and Hive on MR3 using the TPC-DS Benchmark

· 5 min read
Sungwoo Park
MR3 Architect and Developer

Introduction

In our previous article, we evaluate the performance of Trino 418 and Hive on MR3 1.7 using the TPC-DS Benchmark with a scale factor of 10TB.

  • In terms of the total running time, the two systems are comparable: Trino 7424 seconds vs Hive on MR3 7415 seconds.
  • In terms of the geometric mean of running times, Trino is faster than Hive on MR3: Trino 21.75 seconds vs Hive on MR3 27.68 seconds.
  • Trino returns wrong answers on query 23 after running for 1756 seconds.
  • Trino fails to complete query 72 after running for 156 seconds.

Hive vs SparkSQL: Hive-LLAP, Hive on MR3, SparkSQL 2.3.2

· 5 min read
Sungwoo Park
MR3 Architect and Developer

Introduction

In our previous article published in October 2018, we use the TPC-DS benchmark to compare the performance of Hive-LLAP and SparkSQL 2.3.1 included in HDP 3.0.1 along with Hive 3.1.0 on MR3 0.4. In this article, we update the result by testing SparkSQL 2.3.2 included in HDP 3.1.4. As in the previous experiment, we use the TPC-DS benchmark.

Presto vs Hive on MR3 (Presto 317 vs Hive on MR3 0.10)

· 7 min read
Sungwoo Park
MR3 Architect and Developer

Introduction

In our previous article, we use the TPC-DS benchmark to compare the performance of three SQL-on-Hadoop systems: Impala 2.12.0+cdh5.15.2+0, Presto 0.217, and Hive 3.1.1 on MR3 0.6. It uses sequential tests to draw the following conclusion:

  • Impala runs faster than Hive on MR3 on short-running queries that take less than 10 seconds.
  • For long-running queries, Hive on MR3 runs slightly faster than Impala.
  • For most queries, Hive on MR3 runs faster than Presto, sometimes an order of magnitude faster.

Performance Evaluation of Impala, Presto, and Hive on MR3

· 8 min read
Sungwoo Park
MR3 Architect and Developer

Introduction

In our previous article, we use the TPC-DS benchmark to compare the performance of five SQL-on-Hadoop systems: Hive-LLAP, Presto, SparkSQL, Hive on Tez, and Hive on MR3. As it uses both sequential tests and concurrency tests across three separate clusters, we believe that the performance evaluation is thorough and comprehensive enough to closely reflect the current state in the SQL-on-Hadoop landscape.

Performance Evaluation of SQL-on-Hadoop Systems using the TPC-DS Benchmark

· 17 min read
Sungwoo Park
MR3 Architect and Developer

Introduction

We often ask questions on the performance of SQL-on-Hadoop systems:

  • How fast is Hive-LLAP in comparison with Presto, SparkSQL, or Hive on Tez?
  • As it is an MPP-style system, does Presto run the fastest if it successfully executes a query?
  • As it stores intermediate data in memory, does SparkSQL run much faster than Hive on Tez in general?
  • What is the best system for running concurrent queries?
  • ...