Release 1.2 (2020-10-26)
MR3
- Introduce
mr3.k8s.pod.worker.init.container.commandto execute a shell command in a privileged init container. - Introduce
mr3.k8s.pod.master.toleration.specsandmr3.k8s.pod.worker.toleration.specsto specify tolerations for DAGAppMaster and ContainerWorker Pods. - Setting
mr3.dag.queue.schemetoindividualproperly implements fair scheduling among concurrent DAGs. - Introduce
mr3.k8s.pod.worker.additional.hostpathsto mount additional hostPath volumes. mr3.k8s.worker.total.max.memory.gbandmr3.k8s.worker.total.max.cpu.coreswork okay when autoscaling is enabled.- DAGAppMaster and ContainerWorkers can publish Prometheus metrics.
- The default value of
mr3.container.task.failure.num.sleepsis 0. - Reduce the log size of DAGAppMaster and ContainerWorker.
- TaskScheduler can process about twice as many events (
TaskSchedulerEventTaskAttemptFinished) per unit time as in MR3 1.1, thus doubling the maximum cluster size that MR3 can manage. - Optimize the use of CodecPool shared by concurrent TaskAttempts.
- The
getDagscommand of MasterControl prints both IDs and names of DAGs. - On Kubernetes, the
updateResourceLimitcommand of MasterControl updates the limit on the total resources for all ContainerWorker Pods. The user can further improve resource utilization when autoscaling is enabled.
Hive on MR3
- Compute the memory size of ContainerWorker correctly when
hive.llap.io.allocator.mmapis set to true. - Hive expands all system properties in configuration files (such as core-site.xml) before passing to MR3.
hive.server2.transport.modecan be set toall(with HIVE-5312).- MR3 creates three ServiceAccounts: 1) for Metastore and HiveServer2 Pods; 2) for DAGAppMaster Pod; 3) for ContainerWorker Pods. The user can use IAM roles for ServiceAccounts.
- Docker containers start as
root. Inkubernetes/env.sh,DOCKER_USERshould be set torootand the service principal name inHIVE_SERVER2_KERBEROS_PRINCIPALshould beroot. - Support Ranger 2.0.0 and 2.1.0.
Backend for AWS Fargate
Currently MR3 implements backends for Yarn and Kubernetes resource schedulers. Another resource scheduler under consideration is AWS Fargate. Since its unit of resource allocation is containers, AWS Fargate can make MR3 much less likely to suffer from over-provisioning of cluster resources than Yarn and Kubernetes. In conjunction with the support for autoscaling in MR3, the backend for AWS Fargate may enable MR3 to finish the execution of DAGs faster (just by creating more containers as needed) while reducing the overall cost (just by deleting idle containers as soon as possible).
Support for Prometheus
Currently MR3-UI enables users to watch the progress of individual DAGs, but MR3 does not provide a way to monitor the state of the cluster. We plan to extend MR3 so that users can use Prometheus to monitor the state of DAGAppMaster and ContainerWorkers.