Hive 2.x and later running on top of MR3 support LLAP (Low Latency Analytical Processing) I/O. If a ContainerWorker starts with LLAP I/O enabled, it wraps every HiveInputFormat object with an LlapInputFormat object so as to cache all data read via HiveInputFormat. In conjunction with the ability to execute multiple TaskAttempts concurrently inside a single ContainerWorker, the support for LLAP I/O makes Hive on MR3 functionally equivalent to Hive-LLAP.

By virtue of DaemonTasks already available in MR3, it is very easy to implement LLAP I/O in Hive on MR3. If LLAP I/O is enabled, a ContainerGroup creates an MR3 DaemonTask that is responsible for managing LLAP I/O. When a ContainerWorker starts, a DaemonTaskAttempt is created to initialize the LLAP I/O module. Once initialized, the LLAP I/O module works in the background to serve requests from ordinary TaskAttempts. The following code shows the entire implementation of DaemonTaskAttempts for LLAP I/O in Java (excluding the header section):

public class LLAPDaemonProcessor extends AbstractLogicalIOProcessor {
  public LLAPDaemonProcessor(ProcessorContext context) {

  public void initialize() throws IOException {
    Configuration conf = TezUtils.createConfFromUserPayload(getContext().getUserPayload());

  public void run(Map<String, LogicalInput> inputs, Map<String, LogicalOutput> outputs) throws Exception {

  public void handleEvents(List<Event> arg0) {

  public void close() throws IOException {

Since the LLAP I/O module does not communicate with anything else, all methods other than initialize() take no action.

Configuring LLAP I/O

Hive on MR3 configures LLAP I/O with exactly the same configuration keys that Hive-LLAP uses:

  • specifies whether or not to enable LLAP I/O. If set to true, Hive attaches an MR3 DaemonTask for LLAP I/O to the unique ContainerGroup under the all-in-one scheme and the Map ContainerGroup under the per-map-reduce scheme.
  • specifies the size of memory for caching data.
  • specifies the number of threads for serving requests in LLAP I/O.
  • hive.llap.client.consistent.splits should be set to true in order to use consistent hashing of InputSplits (so that the same InputSplit is always mapped to the same ContainerWorker).

Unlike Hive-LLAP, however, the size of the headroom for Java VM overhead (in MB) can be specified explicitly with the configuration key hive.mr3.llap.headroom.mb (which is new in Hive on MR3). The following diagram shows the memory composition of ContainerWorkers with LLAP I/O under the all-in-one scheme:


Note that the heap size of Java VM (for -Xmx option) is obtained by multiplying the memory size of all TaskAttempts (e.g., specified with the configuration key hive.mr3.all-in-one.containergroup.memory.mb under the all-in-one scheme) with a factor specified with the configuration key Here are a couple of examples of configuring LLAP I/O when is set to true:

  • hive.mr3.all-in-one.containergroup.memory.mb=40960,,
    Memory for TaskAttempts = 40960MB = 40GB
    ContainerWorker size = 40GB + 8GB + 32GB = 80GB
    Heap size = 40960MB * 1.0 = 40GB
    Memory for Java VM overhead = Headroom size = 8GB
  • hive.mr3.all-in-one.containergroup.memory.mb=40960,,
    Memory for TaskAttempts = 40960MB = 40GB
    ContainerWorker size = 40GB + 0GB + 40GB = 80GB
    Heap size = 40960MB * 0.8 = 32GB
    Memory for Java VM overhead = Memory for TaskAttempts - Heap size = 8GB

In order to use LLAP I/O in Hive on MR3, those jar files for LLAP I/O should be explicitly listed for the configuration key hive.aux.jars.path in hive-site.xml, as shown in the following example:


Since LLAP I/O in Hive on MR3 does not depend on ZooKeeper, the following configuration keys should be set appropriately in hive-site.xml so that no communication with ZooKeeper can be established.

  • hive.llap.hs2.coordinator.enabled should be set to false.
  • hive.llap.daemon.service.hosts should be set to an empty list.

Configuring LLAP I/O with set to true

If the configuration key is set to true in hive-site.xml, LLAP I/O uses memory-mapped files (instead of memory) to cache data read via HiveInputFormat. The memory-mapped files are created (but not visible to the user) under the directory specified by the configuration key

Since LLAP I/O does not consume memory for caching data, the memory composition of ContainerWorkers with LLAP I/O is slightly different (under the all-in-one scheme). Essentially the configuration key only specifies the size of all memory-mapped files for caching data, and does not affect the memory size of ContainerWorkers.