5 posts tagged with "Hive 2"

View All Tags

Performance Evaluation of SQL-on-Hadoop Systems using the TPC-DS Benchmark

October 30, 2018 · 18 min read

Sungwoo Park

MR3 Architect and Developer

Introduction

We often ask questions on the performance of SQL-on-Hadoop systems:

How fast is Hive-LLAP in comparison with Presto, SparkSQL, or Hive on Tez?
As it is an MPP-style system, does Presto run the fastest if it successfully executes a query?
As it stores intermediate data in memory, does SparkSQL run much faster than Hive on Tez in general?
What is the best system for running concurrent queries?
...

Performance Comparison of HDP LLAP, Presto, SparkSQL, Hive on Tez, and Hive on MR3 using the TPC-DS Benchmark

August 15, 2018 · 9 min read

Sungwoo Park

MR3 Architect and Developer

NOTE: This article is superceded by a new article expanded with results of concurrency tests on newer versions of SQL-on-Hadoop systems.

Hive on MR3 0.2 vs Hive-LLAP

May 19, 2018 · 12 min read

Sungwoo Park

MR3 Architect and Developer

Introduction

Hive running on top of MR3 0.2, or Hive-MR3 henceforth, supports LLAP (Low Latency Analytical Processing) I/O. In conjunction with the ability to execute multiple TaskAttempts concurrently inside a single ContainerWorker, the support for LLAP I/O makes Hive-MR3 functionally equivalent to Hive-LLAP. Hence Hive-MR3 can now serve as a substitute for Hive-LLAP in typical use cases.

Performance Evaluation of Hive on MR3 0.1 (Part II)

April 2, 2018 · 9 min read

Sungwoo Park

MR3 Architect and Developer

Introduction

In order to check if Hive running on top of MR3, or Hive-MR3 henceforth, is ready for production environments, we should test it for performance, stability, and scalability in multi-user environments in which many queries run concurrently. While Hive-on-Tez does a good job in multi-user environments, an analysis of the architecture of Tez reveals that we can further improve its support for multi-user environments by allowing a single DAGAppMaster to manage multiple concurrent DAGs. One of the design goals of MR3 is to overcome this limitation of Tez so as to better support multi-user environments as a new execution engine of Hive.

Performance Evaluation of Hive on MR3 0.1 (Part I)

April 1, 2018 · 6 min read

Sungwoo Park

MR3 Architect and Developer

Introduction

Since Hive running on top of MR3, or Hive-MR3 henceforth, uses MR3 as its execution engine and borrows runtime environments from Tez, a natural question arises as to whether the use of MR3 results in performance improvement in terms of execution time, turnaround time, or overall throughput at all. While it is difficult to accurately quantify the performance of MR3 over Tez as an execution engine, we can compare Hive-MR3 and Hive-on-Tez under identical conditions to see if there is any benefit of using MR3 in place of Tez.

Introduction​

NOTE: This article is superceded by a new article expanded with results of concurrency tests on newer versions of SQL-on-Hadoop systems.​

Introduction​

Introduction​

Introduction​

Introduction

NOTE: This article is superceded by a new article expanded with results of concurrency tests on newer versions of SQL-on-Hadoop systems.

Introduction

Introduction

Introduction