Release 1.4 (2022-2-14)
MR3
- Use Deployment instead of ReplicationController on Kubernetes.
- HistoryLogger correctly sends Vertex finish times to Timeline Server.
- Add more Prometheus metrics.
- Introduce
mr3.application.tags
andmr3.application.scheduling.properties.map
. - The logic for speculative execution uses the average execution time of Tasks (instead of the maximum execution time).
Hive on MR3
- DistCp jobs are sent to MR3, not to Hadoop. As a result, DistCp runs okay on Kubernetes.
- org.apache.tez.common.counters.Limits is initialized in HiveServer2.
- Update Log4j2 to 2.17.1 (for CVE-2021-44228).
MapReduce on MR3
MR3 is not designed specifically for Apache Hive. In fact, MR3 started its life as a general purpose execution engine and thus can easily execute MapReduce jobs by converting to DAGs with two Vertexes. As a new application of MR3, we have implemented MapReduce on MR3 which allows the execution legacy MapReduce code on Kubernetes as well as on Hadoop. All the strengths of MR3, such as concurrent DAGs, cross-DAG container reuse, fault tolerance, and autoscaling, are available when running MapReduce jobs.