Skip to main content

Release 1.2 (2020-10-26)

MR3

  • Introduce mr3.k8s.pod.worker.init.container.command to execute a shell command in a privileged init container.
  • Introduce mr3.k8s.pod.master.toleration.specs and mr3.k8s.pod.worker.toleration.specs to specify tolerations for DAGAppMaster and ContainerWorker Pods.
  • Setting mr3.dag.queue.scheme to individual properly implements fair scheduling among concurrent DAGs.
  • Introduce mr3.k8s.pod.worker.additional.hostpaths to mount additional hostPath volumes.
  • mr3.k8s.worker.total.max.memory.gb and mr3.k8s.worker.total.max.cpu.cores work okay when autoscaling is enabled.
  • DAGAppMaster and ContainerWorkers can publish Prometheus metrics.
  • The default value of mr3.container.task.failure.num.sleeps is 0.
  • Reduce the log size of DAGAppMaster and ContainerWorker.
  • TaskScheduler can process about twice as many events (TaskSchedulerEventTaskAttemptFinished) per unit time as in MR3 1.1, thus doubling the maximum cluster size that MR3 can manage.
  • Optimize the use of CodecPool shared by concurrent TaskAttempts.
  • The getDags command of MasterControl prints both IDs and names of DAGs.
  • On Kubernetes, the updateResourceLimit command of MasterControl updates the limit on the total resources for all ContainerWorker Pods. The user can further improve resource utilization when autoscaling is enabled.

Hive on MR3

  • Compute the memory size of ContainerWorker correctly when hive.llap.io.allocator.mmap is set to true.
  • Hive expands all system properties in configuration files (such as core-site.xml) before passing to MR3.
  • hive.server2.transport.mode can be set to all (with HIVE-5312).
  • MR3 creates three ServiceAccounts: 1) for Metastore and HiveServer2 Pods; 2) for DAGAppMaster Pod; 3) for ContainerWorker Pods. The user can use IAM roles for ServiceAccounts.
  • Docker containers start as root. In kubernetes/env.sh, DOCKER_USER should be set to root and the service principal name in HIVE_SERVER2_KERBEROS_PRINCIPAL should be root.
  • Support Ranger 2.0.0 and 2.1.0.

Backend for AWS Fargate

Currently MR3 implements backends for Yarn and Kubernetes resource schedulers. Another resource scheduler under consideration is AWS Fargate. Since its unit of resource allocation is containers, AWS Fargate can make MR3 much less likely to suffer from over-provisioning of cluster resources than Yarn and Kubernetes. In conjunction with the support for autoscaling in MR3, the backend for AWS Fargate may enable MR3 to finish the execution of DAGs faster (just by creating more containers as needed) while reducing the overall cost (just by deleting idle containers as soon as possible).

Support for Prometheus

Currently MR3-UI enables users to watch the progress of individual DAGs, but MR3 does not provide a way to monitor the state of the cluster. We plan to extend MR3 so that users can use Prometheus to monitor the state of DAGAppMaster and ContainerWorkers.