Performance Evaluation of Spark 2, Spark 3, Hive-LLAP, and Hive on MR3
Introduction
In this article, we evaluate the performance of the following systems.
- Spark 2.3.8
- Spark 3.2.1
- Hive 3.1.2 on MR3 1.4
- Hive-LLAP in HDP 3.1.4 (3.1.0.3.1.4.0-315)
In this article, we evaluate the performance of the following systems.
In our previous article published in October 2018, we use the TPC-DS benchmark to compare the performance of Hive-LLAP and SparkSQL 2.3.1 included in HDP 3.0.1 along with Hive 3.1.0 on MR3 0.4. In this article, we update the result by testing SparkSQL 2.3.2 included in HDP 3.1.4. As in the previous experiment, we use the TPC-DS benchmark.
We often ask questions on the performance of SQL-on-Hadoop systems:
Hive running on top of MR3 0.2, or Hive-MR3 henceforth, supports LLAP (Low Latency Analytical Processing) I/O. In conjunction with the ability to execute multiple TaskAttempts concurrently inside a single ContainerWorker, the support for LLAP I/O makes Hive-MR3 functionally equivalent to Hive-LLAP. Hence Hive-MR3 can now serve as a substitute for Hive-LLAP in typical use cases.