The behavior of Tez runtime is specified by the configuration file tez-site.xml
in the classpath.
MR3 inherits all configuration keys for Tez runtime from original Tez.
For example, tez.runtime.io.sort.mb
specifies the amount of memory required for sorting the output.
MR3 also introduces additional configuration keys which are specific to new features of MR3, and may interpret existing configuration keys in a different way. Below we describe such configuration keys.
Name | Default value | Description |
---|---|---|
tez.runtime.pipelined.sorter.use.soft.reference | false | true: use soft references for ByteBuffers allocated in PipelinedSorter. These soft references are reused across TaskAttempts running in the same ContainerWorker. false: do not use soft references. For more details, see Basic Performance Tuning. |
tez.shuffle-vertex-manager.enable.auto-parallel | false | true: enable auto parallelism for ShuffleVertexManager. false: disable auto parallelism. For more details, see Auto Parallelism and Basic Performance Tuning. |
tez.shuffle-vertex-manager.auto-parallel.min.num.tasks | 20 | Minimum number of Tasks to trigger auto parallelism. For example, if the value is set to 20, only those Vertexes with at least 20 Tasks are considered for auto parallelism. The user can effectively disable auto parallelism by setting this configuration key to a large value. |
tez.shuffle-vertex-manager.auto-parallel.max.reduction.percentage | 10 | Specifies the percentage of Tasks that can be kept after applying auto parallelism. For example, if the value is set to 10, the number of Tasks can be reduced by up to 100 - 10 = 90 percent, thereby leaving 10 percent of Tasks. |
tez.shuffle-vertex-manager.use-stats-auto-parallelism | false | true: analyze input statistics when applying auto parallelism. false: do not use input statistics. |
tez.shuffle.vertex.manager.auto.parallelism.min.percent | 20 | Specifies the lower limit when normalizing input statistics. For example, if the value is set to 20, input statistics are normalized between 20 and 100. That is, an input size of zero is normalized to 20 while the maximum input size is mapped to 100. |
tez.am.shuffle.auxiliary-service.id | mapreduce_shuffle | Service ID for the external shuffle service. Set to tez_shuffle in order to use the shuffle handler of MR3. |
tez.shuffle.max.threads | 0 | Number of threads in each shuffle handler. With the default value of zero, each shuffle handler creates twice as many threads as the number of cores. For more details, see Basic Performance Tuning. |
tez.shuffle.port | 13563 | Default port number for the shuffle handler of MR3 |
tez.runtime.shuffle.connect.timeout | 12500 | Maximum time in milliseconds for trying to connect to the shuffle service or the built-in shuffle handler before reporting fetch-failures. For more details, see Fault Tolerance and Basic Performance Tuning. |
tez.runtime.local.fetch.compare.port | true | true: compare with local ports in the shuffle handler for fetching directly from local disks. false: do not compare. |