Hive on MR3 provides three schemes for assigning ContainerGroups to Vertexes: per-vertex, per-map-reduce, and all-in-one.
Under the per-vertex scheme, each Vertex is assigned its own ContainerGroup. The per-vertex scheme is not useful for typical queries because no ContainerWorkers can be shared between Vertexes.
Under the per-map-reduce scheme, all Map Vertexes are grouped in the Map ContainerGroup while all Reduce Vertexes are grouped in the Reduce ContainerGroup. The per-map-reduce scheme can be useful because Map Vertexes are usually responsible for reading input data and thus have different runtime characteristics than Reduce Vertexes, which are responsible primarily for processing intermediate data produced on the fly. For example, with per-map-reduce scheme, Map Vertexes can send all their TaskAttempts to ContainerWorkers with LLAP I/O while Reduce Vertexes execute their TaskAttempts in ordinary ContainerWorkers residing in Yarn containers.
Under the all-in-one scheme, all Vertexes are grouped in a single ContainerGroup. The all-in-one scheme is an ideal choice in most situations because it allows any TaskAttempt to take any ContainerWorker, thereby achieving a uniform utilization across all ContainerWorkers. For example, ContainerWorkers are deallocated only if no TaskAttempts (from any Vertex) are ready in the queue in the DAGAppMaster.
Specifying the ContainerGroup scheme
The ContainerGroup scheme can be specified with key
per-vertexfor the per-vertex scheme
per-map-reducefor the per-map-reduce scheme
all-in-onefor the all-in-one scheme (which is the default scheme)
Under the per-vertex or per-map-reduce scheme, the following four keys specify the resources to be assigned to each ContainerWorker:
hive.mr3.map.containergroup.vcoresfor CPU cores in a Map ContainerWorker
hive.mr3.map.containergroup.memory.mbfor memory (in MB) in a Map ContainerWorker
hive.mr3.reduce.containergroup.vcoresfor CPU cores in a Reduce ContainerWorker
hive.mr3.reduce.containergroup.memory.mbfor memory (in MB) in a Reduce ContainerWorker
Under the all-in-one scheme, the following two keys specify the resources to be assigned to each ContainerWorker:
hive.mr3.all-in-one.containergroup.vcoresfor CPU cores
hive.mr3.all-in-one.containergroup.memory.mbfor memory (in MB)