Skip to main content

3 posts tagged with "Optimize"

View All Tags

Optimizing Query Compilation in Hive 4 on MR3

· 7 min read
Sungwoo Park
MR3 Architect and Developer

Introduction

In our previous article, we evaluated the performance of Hive 4 on MR3 1.11 and Trino 453 on the 10TB TPC-DS benchmark. The results can be summarized as follows:

  • In terms of the total running time, Hive 4 on MR3 runs slightly faster than Trino -- Hive 4 on MR3 5744 seconds vs Trino 5798 seconds.
  • In terms of the geometric mean of running times, Trino responds about 15 percent faster than Hive 4 on MR3 -- Trino 17.99 seconds vs Hive 4 on MR3 21.02 seconds.

Performance Tuning for Single-table Queries

· 5 min read
Sungwoo Park
MR3 Architect and Developer

Introduction

In our previous article, we have shown that Hive on MR3 1.7 runs much faster than Spark 3.4.0 on the TPC-DS benchmark with a scale factor of 10TB (7415 seconds vs 19669 seconds). The performance gap is expected to widen further due to improvements in Hive on MR3 1.8 (6867 seconds vs 7415 seconds). Still, however, there is a category of queries on which Hive on MR3 seems noticeably slower than Spark: single-table queries with no joins.

Hive Performance: Hive-LLAP in HDP 3.1.4 vs Hive 3/4 on MR3 0.10

· 8 min read
Sungwoo Park
MR3 Architect and Developer

Introduction

In our previous article published in October 2018, we use the TPC-DS benchmark to compare the performance of Hive-LLAP in HDP 3.0.1 (as well as HDP 2.6.4) and Hive 3 on MR3 0.4. We have shown that Hive 3 on MR3 yields consistently higher throughput than Hive-LLAP in concurrency tests, but since then, the performance of Hive-LLAP has improved considerably for concurrent queries. Thus we are interested in the question of how Hive on MR3 compares with Hive-LLAP in the latest lease of HDP.