Release 1.2 (2020-10-26)
MR3
- Introduce
mr3.k8s.pod.worker.init.container.command
to execute a shell command in a privileged init container. - Introduce
mr3.k8s.pod.master.toleration.specs
andmr3.k8s.pod.worker.toleration.specs
to specify tolerations for DAGAppMaster and ContainerWorker Pods. - Setting
mr3.dag.queue.scheme
toindividual
properly implements fair scheduling among concurrent DAGs. - Introduce
mr3.k8s.pod.worker.additional.hostpaths
to mount additional hostPath volumes. mr3.k8s.worker.total.max.memory.gb
andmr3.k8s.worker.total.max.cpu.cores
work okay when autoscaling is enabled.- DAGAppMaster and ContainerWorkers can publish Prometheus metrics.
- The default value of
mr3.container.task.failure.num.sleeps
is 0. - Reduce the log size of DAGAppMaster and ContainerWorker.
- TaskScheduler can process about twice as many events (
TaskSchedulerEventTaskAttemptFinished
) per unit time as in MR3 1.1, thus doubling the maximum cluster size that MR3 can manage. - Optimize the use of CodecPool shared by concurrent TaskAttempts.
- The
getDags
command of MasterControl prints both IDs and names of DAGs. - On Kubernetes, the
updateResourceLimit
command of MasterControl updates the limit on the total resources for all ContainerWorker Pods. The user can further improve resource utilization when autoscaling is enabled.
Hive on MR3
- Compute the memory size of ContainerWorker correctly when
hive.llap.io.allocator.mmap
is set to true. - Hive expands all system properties in configuration files (such as core-site.xml) before passing to MR3.
hive.server2.transport.mode
can be set toall
(with HIVE-5312).- MR3 creates three ServiceAccounts: 1) for Metastore and HiveServer2 Pods; 2) for DAGAppMaster Pod; 3) for ContainerWorker Pods. The user can use IAM roles for ServiceAccounts.
- Docker containers start as
root
. Inkubernetes/env.sh
,DOCKER_USER
should be set toroot
and the service principal name inHIVE_SERVER2_KERBEROS_PRINCIPAL
should beroot
. - Support Ranger 2.0.0 and 2.1.0.
Backend for AWS Fargate
Currently MR3 implements backends for Yarn and Kubernetes resource schedulers. Another resource scheduler under consideration is AWS Fargate. Since its unit of resource allocation is containers, AWS Fargate can make MR3 much less likely to suffer from over-provisioning of cluster resources than Yarn and Kubernetes. In conjunction with the support for autoscaling in MR3, the backend for AWS Fargate may enable MR3 to finish the execution of DAGs faster (just by creating more containers as needed) while reducing the overall cost (just by deleting idle containers as soon as possible).
Support for Prometheus
Currently MR3-UI enables users to watch the progress of individual DAGs, but MR3 does not provide a way to monitor the state of the cluster. We plan to extend MR3 so that users can use Prometheus to monitor the state of DAGAppMaster and ContainerWorkers.