Release 1.3: 2021-8-18
mr3.container.max.num.workersto limit the number of ContainerWorkers.
mr3.k8s.pod.worker.node.affinity.specsto specify node affinity for ContainerWorker Pods.
- No longer use
- Support ContainerWorker recycling (which is different from ContainerWorker reuse) with
mr3.am.task.no.retry.errorsto specify the names of errors that prevent the re-execution of Tasks (e.g.,
- For reporting to MR3-UI, MR3 uses System.currentTimeMillis() instead of MonotonicClock.
- DAGAppMaster correctly reports to MR3Client the time from DAG submission to DAG execution.
mr3.container.localize.python.working.dir.unsafeto localize Python scripts in working directories of ContainerWorkers. Localizing Python scripts is an unsafe operation: 1) Python scripts are shared by all DAGs; 2) once localized, Python scripts are not deleted.
- The image pull policy specified in
mr3.k8s.pod.image.pull.policyapplies to init containers as well as ContainerWorker containers.
mr3.auto.scale.out.num.initial.containerswhich specifies the number of new ContainerWorkers to create in a scale-out operation when no ContainerWorkers are running.
mr3.container.runtime.auto.start.inputto automatically start LogicalInputs in RuntimeTasks.
- On both Hadoop and Kubernetes, there is no limit on the aggregate memory of ContainerWorkers, so MR3 can run in a cluster of any size.
- Speculative execution works on Vertexes with a single Task.
Hive on MR3
- Metastore correctly uses MR3 for compaction on Kubernetes.
- Auto parallelism is correctly enabled or disabled according to the result of compiling queries by overriding
tez.shuffle-vertex-manager.enable.auto-parallelcan be set to false.
- Support the TRANSFORM clause with Python scripts (with
mr3.container.localize.python.working.dir.unsafeset to true in
hive.mr3.llap.orc.memory.per.thread.mbto specify the memory allocated to each ORC manager in low-level LLAP I/O threads.
Spark on MR3
- Initial release
Release 1.2: 2020-10-26
mr3.k8s.pod.worker.init.container.commandto execute a shell command in a privileged init container.
mr3.k8s.pod.worker.toleration.specsto specify tolerations for DAGAppMaster and ContainerWorker Pods.
individualproperly implements fair scheduling among concurrent DAGs.
mr3.k8s.pod.worker.additional.hostpathsto mount additional hostPath volumes.
mr3.k8s.worker.total.max.cpu.coreswork okay when autoscaling is enabled.
- DAGAppMaster and ContainerWorkers can publish Prometheus metrics.
- The default value of mr3.container.task.failure.num.sleeps is 0.
- Reduce the log size of DAGAppMaster and ContainerWorker.
- TaskScheduler can process about twice as many events (
TaskSchedulerEventTaskAttemptFinished) per unit time as in MR3 1.1, thus doubling the maximum cluster size that MR3 can manage.
- Optimize the use of CodecPool shared by concurrent TaskAttempts.
getDagscommand of MasterControl prints both IDs and names of DAGs.
- On Kubernetes, the
updateResourceLimitcommand of MasterControl updates the limit on the total resources for all ContainerWorker Pods. The user can further improve resource utilization when autoscaling is enabled.
Hive on MR3
- Compute the memory size of ContainerWorker correctly when
hive.llap.io.allocator.mmapis set to true.
- Hive expands all system properties in configuration files (such as core-site.xml) before passing to MR3.
hive.server2.transport.modecan be set to
- MR3 creates three ServiceAccounts: 1) for Metastore and HiveSever2 Pods; 2) for DAGAppMaster Pod; 3) for ContainerWorker Pods. The user can use IAM roles for ServiceAccounts.
- Docker containers start as
DOCKER_USERshould be set to
rootand the service principal name in
- Support Ranger 2.0.0 and 2.1.0.
Release 1.1: 2020-7-19
- Support DAG scheduling schemes (specified by
- Optimize DAGAppMaster by freeing memory for messages to Tasks when fault tolerance is disabled (with
mr3.am.task.max.failed.attemptsset to 1).
- Fix a minor memory leak in DaemonTask (which also prevents MR3 from running more than 2^30 DAGs when using the shuffle handler).
- Improve the chance of assigning TaskAttempts to ContainerWorkers that match location hints.
- TaskScheduler can use location hints produced by
- TaskScheduler can use location hints from HDFS when assigning TaskAttempts to ContainerWorker Pods on Kubernetes (with
mr3.k8s.pod.cpu.cores.max.multiplierto specify the multiplier for the limit of CPU cores.
mr3.k8s.pod.memory.max.multiplierto specify the multiplier for the limit of memory.
mr3.k8s.pod.worker.security.context.sysctlsto configure kernel parameters of ContainerWorker Pods using init containers.
- Support speculative execution of TaskAttempts (with
- A ContainerWorker can run multiple shuffle handlers each with a different port. The configuration key
mr3.use.daemon.shufflehandlernow specifies the number of shuffle handlers in each ContainerWorker.
- With speculative execution and the use of multiple shuffle handlers in a single ContainerWorker, fetch delays rarely occur.
- A ContainerWorker Pod can run shuffle handlers in a separate container (with
- On Kubernetes, DAGAppMaster uses ReplicationController instead of Pod, thus making recovery much faster.
- On Kubernetes, ConfigMaps
mr3conf-configmap-workersurvive MR3, so the user should delete them manually.
- Java 8u251/8u252 can be used on Kubernetes 1.17 and later.
Hive on MR3
- CrossProductHandler asks MR3 DAGAppMaster to set
TEZ_CARTESIAN_PRODUCT_MAX_PARALLELISM(Cf. HIVE-16690, Hive 3/4).
- Hive 4 on MR3 is stable (currently using 4.0.0-SNAPSHOT).
- No longer support Hive 1.
- Ranger uses a local directory (emptyDir volume) for logging.
- The open file limit for Solr (in Ranger) is not limited to 1024.
- HiveServer2 and DAGAppMaster create readiness and liveness probes.
Release 1.0: 2020-2-17
- Support DAG priority schemes (specified by
mr3.dag.priority.scheme) and Vertex priority schemes (specified by
- Support secure shuffle (using SSL mode) without requiring separate configuration files.
- ContainerWorker tries to avoid OutOfMemoryErrors by sleeping after a TaskAttempt fails (specified by
- Errors from InputInitializers are properly passed to MR3Client.
- MasterControl supports two new commands for gracefully stopping DAGAppMaster and ContainerWorkers.
Hive on MR3
- Allow fractions for CPU cores (with
- Support rolling updates.
- Hive on MR3 can access S3 using AWS credentials (with or without Helm).
- On Amazon EKS, the user can use S3 instead of PersistentVolumes on EFS.
- Hive on MR3 can use environment variables
AWS_SECRET_ACCESS_KEYto access S3 outside Amazon AWS.
Release 0.11: 2019-12-4
- Support autoscaling.
Hive on MR3
- Memory and CPU cores for Tasks can be set to zero.
- Support autoscaling on Amazon EMR.
- Support autoscaling on Amazon EKS.
Release 0.10: 2019-10-18
- TaskScheduler supports a new scheduling policy (specified by
mr3.taskattempt.queue.scheme) which significantly improves the throughput for concurrent queries.
- DAGAppMaster recovers from OutOfMemoryErrors due to the exhaustion of threads.
Hive on MR3
- Compaction sends DAGs to MR3, instead of MapReduce, when
hive.mr3.compaction.using.mr3is set to true.
- LlapDecider asks MR3 DAGAppMaster for the number of Reducers.
- ConvertJoinMapJoin asks MR3 DAGAppMaster for the currrent number of Nodes to estimate the cost of Bucket Map Join.
- Support Hive 3.1.2 and 2.3.6.
- Support Helm charts.
- Compaction works okay on Kubernetes.
Release 0.9: 2019-7-25
- Each DAG uses its own ClassLoader.
Hive on MR3
- LLAP I/O works properly on Kubernetes.
- UDFs work okay on Kubernetes.
Release 0.8: 2019-6-22
- A new DAGAppMaster properly recovers DAGs that have not been completed in the previous DAGAppMaster.
- Fault tolerance after fetch failures works much faster.
- On Kubernetes, the shutdown handler of DAGAppMaster deletes all running Pods.
- On both Yarn and Kubernetes, MR3Client automatically connects to a new DAGAppMaster after an initial DAGAppMaster is killed.
Hive on MR3
- Hive 3 for MR3 supports high availability on Yarn via ZooKeeper.
- On both Yarn and Kubernetes, multiple HiveServer2 instances can share a common MR3 DAGAppMaster (and thus all its ContainerWorkers as well).
- Support Apache Ranger on Kubernetes.
- Support Timeline Server on Kubernetes.
Release 0.7: 2019-4-26
- Resolve deadlock when Tasks fail or ContainerWorkers are killed.
- Support fault tolerance after fetch failures.
- Support node blacklisting.
Hive on MR3
- Introduce a new configuration key
- Apply HIVE-20618.
Release 0.6: 2019-3-21
- DAGAppMaster can run in its own Pod on Kubernetes.
- Support elastic execution of RuntimeTasks in ContainerWorkers.
- MR3-UI requires only Timeline Server.
Hive on MR3
- Support memory monitoring when loading hash tables for Map-side join.
Release 0.5: 2019-2-18
- Support Kubernetes.
- Support the use of the built-in shuffle handler.
Hive on MR3
- Support Hive 3.1.1 and 2.3.5.
- Initial release for Hive on MR3 on Kubernetes
Release 0.4: 2018-10-29
- Support auto parallelism for reducers with
- Auto parallelism can use input statistics when reassigning partitions to reducers.
- Support ByteBuffer sharing among RuntimeTasks.
Hive on MR3
- Support Hive 3.1.0.
- Hive 1 uses Tez 0.9.1.
- Metastore checks the inclusion of
__HIVE_DEFAULT_PARTITION__when retrieving column statistics.
- MR3JobMonitor returns immediately from MR3 DAGAppMaster when the DAG completes.
Release 0.3: 2018-8-15
- Extend the runtime to support Hive 3.
Hive on MR3
- Support Hive 3.0.0.
- Support query re-execution.
- Support per-query cache in Hive 2 and 3.
Release 0.2: 2018-5-18
- Support asynchronous logging (with
- Delete DAG-local directories after each DAG is finished.
Hive on MR3
- Support LLAP I/O for Hive 2.
- Support Hive 2.2.0.
- Use Hive 2.3.3 instead of Hive 2.3.2.
Release 0.1: 2018-3-31
- Initial release
Hive on MR3
- Initial release
Patches backported to Hive 3.1.0 (in MR3 1.3)
- HIVE-24849 Create external table socket timeout when location has large number of files
- HIVE-20001 With doas set to true, running select query as hrt_qa user on external table fails due to permission denied to read /warehouse/tablespace/managed directory.
- HIVE-19748 Add appropriate null checks to DecimalColumnStatsAggregator
- HIVE-25170 Fix wrong colExprMap generated by SemanticAnalyzer
- HIVE-24224 Fix skipping header/footer for Hive on Tez on compressed file s
- HIVE-24093 Remove unused hive.debug.localtask
- HIVE-23509 Fixing MapJoin Capacity Assertion Error
- HIVE-22476 Hive datediff function provided inconsistent results when hive.fetch.task.conversion is set to none
- HIVE-22769 Incorrect query results and query failure during split generation for compressed text files
- HIVE-5312 Let HiveServer2 run simultaneously in HTTP (over thrift) and Binary (normal thrift transport) mode
- HIVE-22948 QueryCache: Treat query cache locations as temporary storage
- HIVE-22762 Leap day is incorrectly parsed during cast in Hive
- HIVE-22763 0 is accepted in 12-hour format during timestamp cast
- HIVE-22653 Remove commons-lang leftovers
- HIVE-22685 Fix TestHiveSqlDateTimeFormatter To Work With New Year 2020
- HIVE-22511 Fix case of Month token in datetime to string conversion
- HIVE-22422 Missing documentation from HiveSqlDateTimeFormatter: list of date-based patterns
- HIVE-21580 Introduce ISO 8601 week numbering SQL:2016 formats
- HIVE-21579 Introduce more complex SQL:2016 datetime formats
- HIVE-21578 Introduce SQL:2016 formats FM, FX, and nested strings
- HIVE-22945 Hive ACID Data Corruption: Update command mess the other column data and produces incorrect result
- HIVE-21660 Wrong result when union all and later view with explode is used
- HIVE-22891 Skip PartitionDesc Extraction In CombineHiveRecord For Non-LLAP Execution Mode
- HIVE-22815 reduce the unnecessary file system object creation in MROutput
- HIVE-22400 UDF minute with time returns NULL
- HIVE-22700 Compactions may leak memory when unauthorized
- HIVE-22485 Cross product should set the conf in UnorderedPartitionedKVEdgeConfig
- HIVE-22532 PTFPPD may push limit incorrectly through Rank/DenseRank function
- HIVE-22507 KeyWrapper comparator create field comparator instances at every comparison
- HIVE-22435 Exception when using VectorTopNKeyOperator operator
- HIVE-22513 Constant propagation of casted column in filter ops can cause incorrect results
- HIVE-22464 Implement support for NULLS FIRST/LAST in TopNKeyOperator
- HIVE-22429 Migrated clustered tables using bucketing_version 1 on hive 3 uses bucketing_version 2 for inserts
- HIVE-22406:TRUNCATE TABLE fails due MySQL limitations on limit value
- HIVE-22360 MultiDelimitSerDe returns wrong results in last column when the loaded file has more columns than those in table schema
- HIVE-22373 File Merge tasks fail when containers are reused
- HIVE-22336 Updates should be pushed to the Metastore backend DB before creating the notification event
- HIVE-22332 Hive should ensure valid schema evolution settings since ORC-540
- HIVE-22331 unix_timestamp without argument returns timestamp in millisecond instead of second
- HIVE-21924 Split text files even if header/footer exists
- HIVE-22275 OperationManager.queryIdOperation does not properly clean up multiple queryIds
- HIVE-22269 Stats miss with hive.optimize.sort.dynamic.partition (SortedDynPartitionOptimizer) leads to wrong reducer count
- HIVE-22273 Access check is failed when a temporary directory is removed
- HIVE-22208 Column name with reserved keyword is unescaped when query including join on table with mask column is re-written
- HIVE-22232 NPE when hive.order.columnalignment is set to false
- HIVE-22243 Align Apache Thrift version to 0.9.3-1 in standalone-metastore as well
- HIVE-22197 Common Merge join throwing class cast exception.
- HIVE-22231 Hive query with big size via knox fails with Broken pipe Write failed
- HIVE-20113 Shuffle avoidance: Disable 1-1 edges for sorted shuffle
- HIVE-22219 Bringing a node manager down blocks restart of LLAP service
- HIVE-22201 ConvertJoinMapJoin#checkShuffleSizeForLargeTable throws ArrayIndexOutOfBoundsException if no big table is selected
- HIVE-22170 from_unixtime and unix_timestamp should use user session time zone
- HIVE-22169 Tez: SplitGenerator tries to look for plan files which won’t exist for Tez
- HIVE-22055 select count gives incorrect result after loading data from text file
- HIVE-22204 Beeline option to show/not show execution report
- HIVE-15956 StackOverflowError when drop lots of partitions
- HIVE-22164 Vectorized Limit operator returns wrong number of results with offset
- HIVE-22164 Vectorized Limit operator returns wrong number of results with offset
- HIVE-22106 Remove cross-query synchronization for the partition-eval
- HIVE-22168 Remove very expensive logging from the llap cache hotpath
- HIVE-22099 Several date related UDFs can’t handle Julian dates properly since HIVE-20007
- HIVE-22161 UDF: FunctionRegistry synchronizes on org.apache.hadoop.hive.ql.udf.UDFType class
- HIVE-22121 Turning on hive.tez.bucket.pruning produce wrong results
- HIVE-22134 HIVE-22129: Remove glassfish.jersey and mssql-jdbc classes from jdbc-standalone jar
- HIVE-21828 Tez: Use a pre-parsed TezConfiguration from DagUtils - Addendum2
- HIVE-21828 Tez: Use a pre-parsed TezConfiguration from DagUtils - Addendum
- HIVE-21828 Tez: Use a pre-parsed TezConfiguration from DagUtils
- HIVE-22115 Prevent the creation of query routing appender if property is set to false
- HIVE-22120 Fix wrong results/ArrayOutOfBound exception in left outer map joins on specific boundary conditions
- HIVE-22113 Prevent LLAP shutdown on AMReporter related RuntimeException
- HIVE-13457 Create HS2 REST API endpoints for monitoring information
- HIVE-22045 HIVE-21711 introduced regression in data load
- HIVE-21173 Upgrade Apache Thrift to 0.9.3-1
- HIVE-22009 CTLV with user specified location is not honoured.
- HIVE-21711 Regression caused by HIVE-21279 for blobstorage fs
- HIVE-21224 Upgrade tests JUnit3 to JUnit4
- HIVE-21868 Vectorize CAST…FORMAT
- HIVE-21915 Hive with TEZ UNION ALL and UDTF results in data loss
- HIVE-21905 Generics improvement around the FetchOperator class
- HIVE-21902 HiveServer2 UI: jetty response header needs X-Frame-Options
- HIVE-21576 Introduce CAST…FORMAT and limited list of SQL:2016 datetime formats
- HIVE-19661 switch Hive UDFs to use Re2J regex engine
- HIVE-21799 NullPointerException in DynamicPartitionPruningOptimization, when join key is on aggregation column
- HIVE-21796 ArrayWritableObjectInspector.equals can take O(2^nesting_depth) time
- HIVE-21681 Describe formatted shows incorrect information for multiple primary keys
- HIVE-21685 Wrong simplification in query with multiple IN clauses
- HIVE-21509 LLAP may cache corrupted column vectors and return wrong query result
- HIVE-21386 Extend the fetch task enhancement done in HIVE-21279 to make it work with query result cache
- HIVE-21467 Remove deprecated junit.framework.Assert imports
- HIVE-21460 ACID: Load data followed by a select * query results in incorrect results
- HIVE-16924 Support distinct in presence of Group By
- HIVE-21182 Skip setting up hive scratch dir during planning
- HIVE-21279 Avoid moving/rename operation in FileSink op for SELECT queries
- HIVE-21270 A UDTF to show schema (column names and types) of given query
- HIVE-21329 Custom Tez runtime unordered output buffer size depending on operator pipeline
- HIVE-21167 Bucketing: Bucketing version 1 is incorrectly partitioning data
- HIVE-685 add UDFquote
- HIVE-21206 Bootstrap replication is slow as it opens lot of metastore connections
- HIVE-21009 Adding ability for user to set bind user
- HIVE-21009 Adding ability for user to set bind user
- HIVE-21171 Skip creating scratch dirs for tez if RPC is on
- HIVE-17020 Aggressive RS dedup can incorrectly remove OP tree branch
- HIVE-21134 Hive Build Version as UDF
- HIVE-21113 For HPL/SQL that contains boolean expression with NOT, incorrect SQL may be generated
- HIVE-20748 Disable materialized view rewriting when plan pattern is not allowed
- HIVE-21041 NPE, ParseException in getting schema from logical plan
- HIVE-16100 Dynamic Sorted Partition optimizer loses sibling operators
- HIVE-20979 Fix memory leak in hive streaming
- HIVE-20985 If select operator inputs are temporary columns vectorization may reuse some of them as output
- HIVE-20827 Inconsistent results for empty arrays
- HIVE-20953 Remove a function from function registry when it can not be added to the metastore when creating it.
- HIVE-20976 JDBC queries containing joins gives wrong results
- HIVE-20873 Use Murmur hash for VectorHashKeyWrapperTwoLong to reduce hash collision
- HIVE-20949 Improve PKFK cardinality estimation in physical planning
- HIVE-20952 Cleaning VectorizationContext.java
- HIVE-20937 Postgres jdbc query fail with “LIMIT must not be negative”
- HIVE-20918 Flag to enable/disable pushdown of computation from Calcite into JDBC connection
- HIVE-20910 Insert in bucketed table fails due to dynamic partition sort optimization
- HIVE-20676 HiveServer2: PrivilegeSynchronizer is not set to daemon status
- HIVE-19701 getDelegationTokenFromMetaStore doesn’t need to be synchronized
- HIVE-20682 Async query execution can potentially fail if shared sessionHive is closed by master thread
- HIVE-20881 Constant propagation oversimplifies projections
- HIVE-20813 udf to_epoch_milli need to support timestamp without time zone as well.
- HIVE-16839 Unbalanced calls to openTransaction/commitTransaction when alter the same partition concurrently
- HIVE-20868 SMB Join fails intermittently when TezDummyOperator has child op in getFinalOp in MapRecordProcessor
- HIVE-20486 Vectorization support for Kafka Storage Handler (addendum)
- HIVE-20486 Vectorization support for Kafka Storage Handler
- HIVE-20817 Reading Timestamp datatype via HiveServer2 gives errors
- HIVE-20834 Hive QueryResultCache entries keeping reference to SemanticAnalyzer from cached query
- HIVE-20815 JdbcRecordReader.next shall not eat exception
- HIVE-20821 Rewrite SUM0 into SUM + COALESCE combination
- HIVE-20830 JdbcStorageHandler range query assertion failure in some cases
- HIVE-20829 JdbcStorageHandler range split throws NPE
- HIVE-20820 MV partition on clause position
- HIVE-20792 Inserting timestamp with zones truncates the data
- HIVE-20638 Upgrade version of Jetty to 9.3.25.v20180904
- HIVE-20763 Add google cloud storage (gs) to the exim uri schema whitelist
- HIVE-20703 Put dynamic sort partition optimization under cost based decision
- HIVE-20768 Adding Tumbling Window UDF
- HIVE-20762 NOTIFICATION_LOG cleanup interval is hardcoded as 60s and is too small
- HIVE-20477 OptimizedSql is not shown if the expression contains INs
- HIVE-20720 Add partition column option to JDBC handler
- HIVE-20735 Adding Support for Kerberos Auth, Removed start/end offset columns, remove the best effort mode and made 2pc default for EOS
- HIVE-20761 Select for update on notification_sequence table has retry interval and retries count too small
- HIVE-20731 keystore file in JdbcStorageHandler should be authorized
- HIVE-20731 keystore file in JdbcStorageHandler should be authorized
- HIVE-20696 msck_*.q tests are broken
- HIVE-20702 Account for overhead from datastructure aware estimations during mapjoin selection
- HIVE-20704 Extend HivePreFilteringRule to support other functions
- HIVE-20644 Avoid exposing sensitive infomation through a Hive Runtime exception
- HIVE-20712 HivePointLookupOptimizer should extract deep cases
- HIVE-20705 Vectorization: Native Vector MapJoin doesn’t support Complex Big Table values
- HIVE-20710 Constant folding may not create null constants without types
- HIVE-20649 LLAP aware memory manager for Orc writers
- HIVE-20639 Add ability to Write Data from Hive Table/Query to Kafka Topic
- HIVE-20648 LLAP: Vector group by operator should use memory per executor
- HIVE-20692 Enable folding of NOT x IS (NOT) [TRUE|FALSE] expressions
- HIVE-20623 Shared work: Extend sharing of map-join cache entries in LLAP
- HIVE-20651 JdbcStorageHandler password should be encrypted
- HIVE-20646 Partition filter condition is not pushed down to metastore query if it has IS NOT NULL
- HIVE-20652 JdbcStorageHandler push join of two different datasource to jdbc driver
- HIVE-20563 Vectorization: CASE WHEN expression fails when THEN/ELSE type and result type are different
- HIVE-20691 Fix org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[cttl]
- HIVE-20657 pre-allocate LLAP cache at init time
- HIVE-20609 Create SSD cache dir if it doesnt exist already
- HIVE-20618 During join selection BucketMapJoin might be choosen for non bucketed tables
- HIVE-10296 Cast exception observed when hive runs a multi join query on metastore (postgres), since postgres pushes the filter into the join, and ignores the condition before applying cast
- HIVE-20552 Get Schema from LogicalPlan faster
- HIVE-20627 Concurrent async queries intermittently fails with LockException and cause memory leak
- HIVE-20540 Vectorization : Support loading bucketed tables using sorted dynamic partition optimizer - II
- HIVE-18871 hive on tez execution error due to set hive.aux.jars.path to hdfs://
- HIVE-20603 “Wrong FS” error when inserting to partition after changing table location filesystem
- HIVE-20620 manifest collisions when inserting into bucketed sorted MM tables with dynamic partitioning
- HIVE-20625 Regex patterns not working in SHOW MATERIALIZED VIEWS ‘’
- HIVE-20095 Fix feature to push computation to jdbc external tables
- HIVE-20568 There is no need to convert the dbname to pattern while pulling tablemeta
- HIVE-20507 Beeline: Add a utility command to retrieve all uris from beeline-site.xml
- HIVE-20498 Support date type for column stats autogather
- HIVE-20583 Use canonical hostname only for kerberos auth in HiveConnection
- HIVE-20561 Use the position of the Kafka Consumer to track progress instead of Consumer Records offsets
- HIVE-20558 Change default of hive.hashtable.key.count.adjustment to 0.99
- HIVE-20524 Schema Evolution checking is broken in going from Hive version 2 to version 3 for ALTER TABLE VARCHAR to DECIMAL
- HIVE-20537 Multi-column joins estimates with uncorrelated columns different in CBO and Hive
- HIVE-20541 REPL DUMP on external table with add partition event throws NoSuchElementException
- HIVE-20503 Use datastructure aware estimations during mapjoin selection
- HIVE-20412 NPE in HiveMetaHook
- HIVE-20296 Improve HivePointLookupOptimizerRule to be able to extract from more sophisticated contexts
- HIVE-17921 Aggregation with struct in LLAP produces wrong result
- HIVE-20481 Add the Kafka Key record as part of the row
- HIVE-20513 Vectorization: Improve Fast Vector MapJoin Bytes Hash Tables
- HIVE-20508 Hive does not support user names of type “user@realm”
- HIVE-20522 HiveFilterSetOpTransposeRule may throw assertion error due to nullability of fields
- HIVE-20510 Vectorization : Support loading bucketed tables using sorted dynamic partition optimizer
- HIVE-20515 Empty query results when using results cache and query temp dir, results cache dir in different filesystems
- HIVE-20499 GetTablesOperation pull all the tables meta irrespective of auth.
- HIVE-15932 Add support for: “explain ast”
- HIVE-20377 Hive Kafka Storage Handler
- HIVE-19662 Upgrade Avro to 1.8.2
- HIVE-19993 Using a table alias which also appears as a column name is not possible
- HIVE-20432 Rewrite BETWEEN to IN for integer types for stats estimation
- HIVE-20491 Fix mapjoin size estimations for Fast implementation
- HIVE-20476 CopyUtils used by REPL LOAD and EXPORT/IMPORT operations ignore distcp error
- HIVE-20496 Vectorization: Vectorized PTF IllegalStateException
- HIVE-20225 SerDe to support Teradata Binary Format
- HIVE-20433 Implicit String to Timestamp conversion is slow
- HIVE-20439 addendum
- HIVE-20439 Use the inflated memory limit during join selection for llap
- HIVE-20013 Add an Implicit cast to date type for to_date function
- HIVE-20187 Incorrect query results in hive when hive.convert.join.bucket.mapjoin.tez is set to true
- HIVE-20455 Log spew from security.authorization.PrivilegeSynchonizer.run
- HIVE-20315 Vectorization: Fix more NULL / Wrong Results issues and avoid unnecessary casts/conversions
- HIVE-20339 Vectorization: Lift unneeded restriction causing some PTF with RANK not to be vectorized
- HIVE-20352 Vectorization: Support grouping function
- HIVE-20367 Vectorization: Support streaming for PTF AVG, MAX, MIN, SUM
- HIVE-20399 CTAS w/a custom table location that is not fully qualified fails for MM tables
- HIVE-20443 txn stats cleanup in compaction txn handler is unneeded
- HIVE-20418 LLAP IO may not handle ORC files that have row index disabled correctly for queries with no columns selected
- HIVE-20409 Hive ACID: Update/delete/merge does not clean hdfs staging directory
- HIVE-20366 TPC-DS query78 stats estimates are off for is null filter
- HIVE-20406 Addendum patch
- HIVE-20406 Nested Coalesce giving incorrect results
- HIVE-20368 Remove VectorTopNKeyOperator lock
- HIVE-20410 aborted Insert Overwrite on transactional table causes “Not enough history available for…” error
- HIVE-20321 Vectorization: Cut down memory size of 1 col VectorHashKeyWrapper to <1 CacheLine
- HIVE-20391 HiveAggregateReduceFunctionsRule may infer wrong return type when decomposing aggregate function
- HIVE-14898 HS2 shouldn’t log callstack for an empty auth header error
- HIVE-20389 NPE in SessionStateUserAuthenticator when authenticator=SessionStateUserAuthenticator
- HIVE-18620 Improve error message while dropping a table that is part of a materialized view
- HIVE-20345 Drop database may hang if the tables get deleted from a different call
- HIVE-20379 Rewriting with partitioned materialized views may reference wrong column
- HIVE-19316 StatsTask fails due to ClassCastException
- HIVE-20329 Long running repl load (incr/bootstrap) causing OOM error
- HIVE-19924 Tag distcp jobs run by Repl Load
- HIVE-20354 Semijoin hints dont work with merge statements
- HIVE-20340 Druid Needs Explicit CASTs from Timestamp to STRING when the output of timestamp function is used as String
- HIVE-20344 PrivilegeSynchronizer for SBA might hit AccessControlException
- HIVE-20316 Skip external table file listing for create table event
- HIVE-20336 Masking and filtering policies for materialized views
- HIVE-20337 CachedStore: getPartitionsByExpr is not populating the partition list correctly
- HIVE-20335 Add tests for materialized view rewriting with composite aggregation functions
- HIVE-20326 Create constraints with RELY as default instead of NO RELY
- HIVE-19408 Improve show materialized views statement to show more information about invalidation
- HIVE-20118 SessionStateUserAuthenticator.getGroupNames() is always empty
- HIVE-20290 Lazy initialize ArrowColumnarBatchSerDe so it doesn’t allocate buffers during GetSplits
- HIVE-20277 Vectorization: Case expressions that return BOOLEAN are not supported for FILTER
- HIVE-19097 related equals and in operators may cause inaccurate stats estimations
- HIVE-20314 Include partition pruning in materialized view rewriting
- HIVE-20301 Enable vectorization for materialized view rewriting tests
- HIVE-20302 LLAP: non-vectorized execution in IO ignores virtual columns, including ROW__ID
- HIVE-20294 Vectorization: Fix NULL / Wrong Results issues in COALESCE / ELT
- HIVE-20281 SharedWorkOptimizer fails with ‘operator cache contents and actual plan differ’
- HIVE-20260 NDV of a column shouldn’t be scaled when row count is changed by filter on another column
- HIVE-14493 Partitioning support for materialized views
- HIVE-18201 Disable XPROD_EDGE for sq_count_check() created for scalar subqueries
- HIVE-20130 Better logging for information schema synchronizer
- HIVE-20244 forward port HIVE-19704 to master
- HIVE-20101 BloomKFilter: Avoid using the local byte arrays entirely
- HIVE-20177 Vectorization: Reduce KeyWrapper allocation in GroupBy Streaming mode
- HIVE-20245 Vectorization: Fix NULL / Wrong Results issues in BETWEEN / IN
- HIVE-20247 cleanup issues in LLAP IO after cache OOM ADDENDUM
- HIVE-20247 cleanup issues in LLAP IO after cache OOM
- HIVE-20249 LLAP IO: NPE during refCount decrement
- HIVE-20263 Typo in HiveReduceExpressionsWithStatsRule variable
- HIVE-20209 Metastore connection fails for first attempt in repl dump
- HIVE-19770 Support for CBO for queries with multiple same columns in select
- HIVE-20035 write booleans as long when serializing to druid
- HIVE-20105 Druid-Hive: tpcds query on timestamp throws java.lang.IllegalArgumentException: Cannot create timestamp, parsing error
- HIVE-18729 Druid Time column type
- HIVE-20213 Upgrade Calcite to 1.17.0
- HIVE-20212 Hiveserver2 in http mode emitting metric default.General.open_connections incorrectly
- HIVE-20240 Semijoin Reduction : Use local variable to check for external table condition
- HIVE-20228 configure repl configuration directories based on user running hiveserver2
- HIVE-20203 Arrow SerDe leaks a DirectByteBuffer
- HIVE-20207 Vectorization: Fix NULL / Wrong Results issues in Filter / Compare
- HIVE-19935 Hive WM session killed: Failed to update LLAP tasks count
- HIVE-20082 HiveDecimal to string conversion doesn’t format the decimal correctly
- HIVE-20164 Murmur Hash : Make sure CTAS and IAS use correct bucketing version
- HIVE-19891 inserting into external tables with custom partition directories may cause data loss
- HIVE-17683 Add explain locks command
- HIVE-20192 HS2 with embedded metastore is leaking JDOPersistenceManager objects
- HIVE-20204 Type conversion during IN () comparisons is using different rules from other comparison operations
- HIVE-20127 fix some issues with LLAP Parquet cache
- HIVE-4367 enhance TRUNCATE syntax to drop data of external table
- HIVE-17896 order3.q fails with NullPointerException if hive.cbo.enable=false and hive.optimize.topnkey=true
- HIVE-17896 TopNKey: Create a standalone vectorizable TopNKey operator
- HIVE-20149 TestHiveCli failing/timing out
- HIVE-19360 CBO: Add an “optimizedSQL” to QueryPlan object
- HIVE-20120 Incremental repl load DAG generation is causing OOM error
- HIVE-20183 Inserting from bucketed table can cause data loss, if the source table contains empty bucket
- HIVE-20197 Vectorization: Add DECIMAL_64 testing, add Date/Interval/Timestamp arithmetic, and add more GROUP BY Aggregation tests
- HIVE-20172 StatsUpdater failed with GSS Exception while trying to connect to remote metastore
- HIVE-20165 Enable ZLIB for streaming ingest
- HIVE-20116 TezTask is using parent logger
- HIVE-20152 reset db state, when repl dump fails, so rename table can be done
- HIVE-20174 Vectorization: Fix NULL / Wrong Results issues in GROUP BY Aggregation Functions
- HIVE-19992 Vectorization: Follow-on to HIVE-19951 –> add call to SchemaEvolution.isOnlyImplicitConversion to disable encoded LLAP I/O for ORC only when data type conversion is not implicit
- HIVE-20185 Backport HIVE-20111 to branch-3
- HIVE-20147 Hive streaming ingest is contented on synchronized logging
- HIVE-20019 Ban commons-logging and log4j
- HIVE-20088 Beeline config location path is assembled incorrectly
- HIVE-20069 Fix reoptimization in case of DPP and Semijoin optimization
- HIVE-19387 Truncate table for Acid tables conflicts with ResultSet cache
- HIVE-20093 LlapOutputFomatService: Use ArrowBuf with Netty for Accounting
- HIVE-20129 Revert to position based schema evolution for orc tables
- HIVE-20100 OpTraits : Select Optraits should stop when a mismatch is detected
- HIVE-20099 Fix logger for LlapServlet
- HIVE-20184 Backport HIVE-20085 to branch-3
- HIVE-20098 Statistics: NPE when getting Date column partition statistics
- HIVE-20182 Backport HIVE-20067 to branch-3
- HIVE-19850 Dynamic partition pruning in Tez is leading to ‘No work found for tablescan’ error
- HIVE-20039 Bucket pruning: Left Outer Join on bucketed table gives wrong result
- HIVE-19860 HiveServer2 ObjectInspectorFactory memory leak with cachedUnionStructObjectInspector
- HIVE-19326 stats auto gather: incorrect aggregation during UNION queries (may lead to incorrect results)
- HIVE-20051 Skip authorization for temp tables
- HIVE-17840 HiveMetaStore eats exception if transactionalListeners.notifyEvent fail
- HIVE-20059 Hive streaming should try shade prefix unconditionally on exception
- HIVE-19951 Vectorization: Need to disable encoded LLAP I/O for ORC when there is data type conversion (Schema Evolution)
- HIVE-19812 Disable external table replication by default via a configuration property
- HIVE-20008 Fix second compilation errors in ql
- HIVE-20038 Update queries on non-bucketed + partitioned tables throws NPE
- HIVE-19995 Aggregate row traffic for acid tables
- HIVE-19970 Replication dump has a NPE when table is empty
- HIVE-20028 Metastore client cache config is used incorrectly
- HIVE-20004 Wrong scale used by ConvertDecimal64ToDecimal results in incorrect results
- HIVE-20009 Fix runtime stats for merge statement
- HIVE-19989 Metastore uses wrong application name for HADOOP2 metrics
- HIVE-19967 SMB Join : Need Optraits for PTFOperator ala GBY Op
- HIVE-20011 Move away from append mode in proto logging hook
- HIVE-18786 NPE in Hive windowing functions
- HIVE-19829 Incremental replication load should create tasks in execution phase rather than semantic phase
- HIVE-18140 Partitioned tables statistics can go wrong in basic stats mixed case
- HIVE-20231 Backport HIVE-19981 to branch-3
- HIVE-19564 Vectorization: Fix NULL / Wrong Results issues in Arithmetic
- HIVE-19866 improve LLAP cache purge
- HIVE-19663 refactor LLAP IO report generation
- Revert “HIVE-19866 improve LLAP cache purge "
- HIVE-19759 Flaky test: TestRpc#testServerPort
- HIVE-19432 GetTablesOperation is too slow if the hive has too many databases and tables
- HIVE-6980 Drop table by using direct sql
- HIVE-18906 Lower Logging for “Using direct SQL”
- HIVE-19285 Add logs to the subclasses of MetaDataOperation
- HIVE-18986 Table rename will run java.lang.StackOverflowError in dataNucleus if the table contains large number of columns
- HIVE-19104 When test MetaStore is started with retry the instances should be independent
- HIVE-18827 useless dynamic value exceptions strike back
- HIVE-24331 Add Jenkinsfile for branch-3.1
- HIVE-19170 Fix TestMiniDruidKafkaCliDriver – addendum patch
- HIVE-19170 Fix TestMiniDruidKafkaCliDriver
- HIVE-23323 Add qsplits profile
- HIVE-23044 Make sure Cleaner doesn’t delete delta directories for running queries
- HIVE-23088 Using Strings from log4j breaks non-log4j users
- HIVE-22704 Distribution package incorrectly ships the upgrade.order files from the metastore module
- HIVE-22708 Fix for HttpTransport to replace String.equals
- HIVE-22407 Hive metastore upgrade scripts have incorrect (or outdated) comment syntax
- HIVE-22241 Implement UDF to interpret date/timestamp using its internal representation and Gregorian-Julian hybrid calendar
- HIVE-21508 ClassCastException when initializing HiveMetaStoreClient on JDK10 or newer
- HIVE-19667 Remove distribution management tag from pom.xml
- HIVE-21980 Parsing time can be high in case of deeply nested subqueries
- HIVE-22105 Update ORC to 1.5.6 in branch-3
- HIVE-20057 For ALTER TABLE t SET TBLPROPERTIES (‘EXTERNAL'='TRUE’);
TBL_TYPEattribute change not reflecting for non-CAPS
- HIVE-18874 JDBC: HiveConnection shades log4j interfaces
- HIVE-21821 Backport HIVE-21739 to branch-3.1
- HIVE-21786 Update repo URLs in poms - branh 3.1 version
- HIVE-21755 Backport HIVE-21462 to branch-3 Upgrading SQL server backed metastore when changing data type of a column with constraints
- HIVE-21758 DBInstall tests broken on master and branch-3.1
- HIVE-21291 Restore historical way of handling timestamps in Avro while keeping the new semantics at the same time
- HIVE-21564 Load data into a bucketed table is ignoring partitions specs and loads data into default partition
- HIVE-20593 Load Data for partitioned ACID tables fails with bucketId out of range: -1
- HIVE-21600 GenTezUtils.removeSemiJoinOperator may throw out of bounds exception for TS with multiple children
- HIVE-21613 Queries with join condition having timestamp or timestamp with local time zone literal throw SemanticException
- HIVE-18624 Parsing time is extremely high (~10 min) for queries with complex select expressions
- HIVE-21540 Query with join condition having date literal throws SemanticException
- HIVE-21342 Analyze compute stats for column leave behind staging dir on hdfs
- HIVE-21290 Restore historical way of handling timestamps in Parquet while keeping the new semantics at the same time
- HIVE-20126 OrcInputFormat does not pass conf to orc reader options
- HIVE-21376 Incompatible change in Hive bucket computation
- HIVE-21236 SharedWorkOptimizer should check table properties
- HIVE-21156 SharedWorkOptimizer may preserve filter in TS incorrectly
- HIVE-21039 CURRENT_TIMESTAMP returns value in UTC time zone
- HIVE-20010 Fix create view over literals
- HIVE-20420 Provide a fallback authorizer when no other authorizer is in use
- HIVE-18767 Some alterPartitions invocations throw ‘NumberFormatException: null’
- HIVE-18778 Needs to capture input/output entities in explain
- HIVE-20555 HiveServer2: Preauthenticated subject for http transport is not retained for entire duration of http communication in some cases
- HIVE-20227 Exclude glassfish javax.el dependency
- HIVE-19027 Make materializations invalidation cache work with multiple active remote metastores
- HIVE-20102 Add a couple of additional tests for query parsing
- HIVE-20123 Fix masking tests after HIVE-19617
- HIVE-20076 ACID: Fix Synthetic ROW__ID generation for vectorized orc readers (addendum)
- HIVE-20076 ACID: Fix Synthetic ROW__ID generation for vectorized orc readers
- HIVE-20135 Fix incompatible change in TimestampColumnVector to default to UTC