Enabling autoscaling

The user can enable autoscaling by setting the configuration key mr3.enable.auto.scaling to true in mr3-site.xml, as explained in Autoscaling. With mr3.enable.auto.scaling set to true, DAGAppMaster automatically adjusts the number of ContainerWorker Pods by analyzing the aggregate utilization of the cluster. In a cloud environment such as Amazon EKS, adjusting the number of ContainerWorker Pods can also trigger commissioning and decommissioning of worker nodes.

Exploiting MasterControl

In conjunction with the utility MasterControl, the user can further improve resource utilization when autoscaling is enabled. MR3 ensures that the total resources for all ContainerWorker Pods never exceed the limit specified by the configuration keys mr3.k8s.worker.total.max.memory.gb and mr3.k8s.worker.total.max.cpu.cores in mr3-site.xml (whether autoscaling is enabled or not). The command updateResourceLimit of MasterControl, however, allows the user to dynamically update this limit. Thus we can combine autoscaling with MasterControl to have fine control of resources consumed by ContainerWorker Pods.

  • By executing MasterControl, we set a limit on the total resources that is appropriate for the current workload.
  • With autoscaling, we try to make the best use of resources within the limit.

The user can execute MasterControl either manually (e.g., right after starting and finishing a particular workload) or automatically after understanding the pattern of workloads. Here are a couple of examples of executing MasterControl automatically.

  • The user can set up an external monitor (such as Prometheus) so as to execute MasterControl automatically whenever the resource utilization is too high or too low.
  • The user can set up cron jobs to execute MasterControl at specific times of the day.

In an on-premise environment, it may make sense to rely solely on MasterControl without enabling autoscaling. In such a case, executing MasterControl is meant to dedicate a specific amount of resources to Hive on MR3 that are not be shared by other applications. In practice, however, a weak form of autoscaling is still operational because a ContainerWorker Pod terminates itself after it remains idle for a period specified by the configuration key mr3.container.idle.timeout.ms in mr3-site.xml.