Hive on MR3 provides three schemes for assigning ContainerGroups to Vertexes: per-vertex, per-map-reduce, and all-in-one.
Per-vertex scheme
Under the per-vertex scheme, each Vertex is assigned its own ContainerGroup. The per-vertex scheme is not useful for typical queries because no ContainerWorkers can be shared between Vertexes.
Per-map-reduce scheme
Under the per-map-reduce scheme, all Map Vertexes are grouped in the Map ContainerGroup while all Reduce Vertexes are grouped in the Reduce ContainerGroup. The per-map-reduce scheme can be useful because Map Vertexes are usually responsible for reading input data and thus have different runtime characteristics than Reduce Vertexes, which are responsible primarily for processing intermediate data produced on the fly. For example, with per-map-reduce scheme, Map Vertexes can send all their TaskAttempts to ContainerWorkers with LLAP I/O while Reduce Vertexes execute their TaskAttempts in ordinary ContainerWorkers residing in Yarn containers.
All-in-one scheme
Under the all-in-one scheme, all Vertexes are grouped in a single ContainerGroup. The all-in-one scheme is an ideal choice in most situations because it allows any TaskAttempt to take any ContainerWorker, thereby achieving a uniform utilization across all ContainerWorkers. For example, ContainerWorkers are deallocated only if no TaskAttempts (from any Vertex) are ready in the queue in the DAGAppMaster.
Specifying the ContainerGroup scheme
The ContainerGroup scheme can be specified with key hive.mr3.containergroup.scheme
in hive-site.xml
:
per-vertex
for the per-vertex schemeper-map-reduce
for the per-map-reduce schemeall-in-one
for the all-in-one scheme (which is the default scheme)
Under the per-vertex or per-map-reduce scheme, the following four keys specify the resources to be assigned to each ContainerWorker:
hive.mr3.map.containergroup.vcores
for CPU cores in a Map ContainerWorkerhive.mr3.map.containergroup.memory.mb
for memory (in MB) in a Map ContainerWorkerhive.mr3.reduce.containergroup.vcores
for CPU cores in a Reduce ContainerWorkerhive.mr3.reduce.containergroup.memory.mb
for memory (in MB) in a Reduce ContainerWorker
Under the all-in-one scheme, the following two keys specify the resources to be assigned to each ContainerWorker:
hive.mr3.all-in-one.containergroup.vcores
for CPU coreshive.mr3.all-in-one.containergroup.memory.mb
for memory (in MB)