For asking questions on MR3, please visit MR3 Google Group.

1. Metastore fails to find a MySQL connector jar file in the classpath with ClassNotFoundException.

2020-07-18T04:03:14,856 ERROR [main] tools.HiveSchemaHelper: Unable to find driver class
java.lang.ClassNotFoundException: com.mysql.jdbc.Driver

The classpath of Metastore includes the directories /opt/mr3-run/lib and /opt/mr3-run/host-lib inside the Metastore Pod, and the user can place a MySQL connector jar file in one of these two directories in three different ways.

  • When building a Docker image, set HIVE_MYSQL_DRIVER in env.sh (not Kubernetes/env.sh) to the path of the jar file. Then the jar file is found in the directory /opt/mr3-run/lib inside the Metastore Pod.

    bash-4.2$ ls /opt/mr3-run/lib
    mysql-connector-java-5.1.49.jar
    

    In this case, the user should comment out the following lines in kubernetes/yaml/metastore.yaml so that the directory /opt/mr3-run/lib is not overriden by a subdirectory in the PersistentVolume.

          # - name: work-dir-volume
          #   mountPath: /opt/mr3-run/lib
          #   subPath: lib
    

    With Helm, the user should set metastore.mountLib to false in kubernetes/helm/hive/values.yaml.

    metastore:
      mountLib: false
    
  • If the Docker image is not built with the jar file, the user can copy it to the subdirectory lib in the PersistentVolume and use PersistentVolumeClaim work-dir-volume in kubernetes/yaml/metastore.yaml. Then the jar file is mounted in the directory /opt/mr3-run/lib inside the Metastore Pod. See On Minikube with a Pre-built Docker Image and Copying a MySQL connector jar file to EFS in Creating a PersistentVolume using EFS for examples.

        - name: work-dir-volume
          mountPath: /opt/mr3-run/lib
          subPath: lib
    

    With Helm, the user should set metastore.mountLib to true in kubernetes/helm/hive/values.yaml.

    metastore:
      mountLib: true
    
  • If the Docker image is not built with the jar file and a PersistentVolume is not available (e.g., when using S3 instead of EFS on Amazon EKS), the user can mount it in the directory /opt/mr3-run/host-lib using a hostPath volume. See Downloading a MySQL connector in Creating an EKS cluster for an example.

    With Helm, kubernetes/helm/hive/values.yaml should set metastore.hostLib to true and set metastore.hostLibDir to a common local directory containing the jar file on all worker nodes.

    hostLib: true
    hostLibDir: "/home/ec2-user/lib"
    

2. When running a query, ContainerWorker Pods never get launched and Beeline gets stuck.

Try adjusting the resource for DAGAppMaster and ContainerWorker Pods. In kubernetes/conf/mr3-site.xml, the user can adjust the resource for the DAGAppMaster Pod.

<property>
  <name>mr3.am.resource.memory.mb</name>
  <value>16384</value>
</property>

<property>
  <name>mr3.am.resource.cpu.cores</name>
  <value>2</value>
</property>

In kubernetes/conf/hive-site.xml, the user can adjust the resource for ContainerWorker Pods (assuming that the configuration key hive.mr3.containergroup.scheme is set to all-in-one).

<property>
  <name>hive.mr3.map.task.memory.mb</name>
  <value>8192</value>
</property>

<property>
  <name>hive.mr3.map.task.vcores</name>
  <value>1</value>
</property>

<property>
  <name>hive.mr3.reduce.task.memory.mb</name>
  <value>8192</value>
</property>

<property>
  <name>hive.mr3.reduce.task.vcores</name>
  <value>1</value>
</property>

<property>
  <name>hive.mr3.all-in-one.containergroup.memory.mb</name>
  <value>16384</value>
</property>

<property>
  <name>hive.mr3.all-in-one.containergroup.vcores</name>
  <value>2</value>
</property>

3. A query fails with a message No space available in any of the local directories

A query may fail with the following error from Beeline:

ERROR : Terminating unsuccessfully: Vertex failed, vertex_2134_0000_1_01, Some(Task unsuccessful: Map 1, task_2134_0000_1_01_000000, java.lang.RuntimeException: org.apache.hadoop.util.DiskChecker$DiskErrorException: No space available in any of the local directories.
  at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:370)
...
Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: No space available in any of the local directories.

In such a case, check if the configuration key mr3.k8s.pod.worker.hostpaths in kubernetes/conf/mr3-site.xml is properly set, e.g.:

<property>
  <name>mr3.k8s.pod.worker.hostpaths</name>
  <value>/data1/k8s,/data2/k8s,/data3/k8s,/data4/k8s,/data5/k8s,/data6/k8s</value>
</property>

In addition, check if the directories listed in mr3.k8s.pod.worker.hostpaths are writable to the user running Pods.

4. A query accessing S3 fails with SdkClientException: Unable to execute HTTP request: Timeout waiting for connection from pool

A query accessing S3 may fail with an error message SdkClientException: Unable to execute HTTP request: Timeout waiting for connection from pool. This can happen in DAGAppMaster when running InputInitializer, in which case the Beeline and the DAGAppMaster Pod generate such errors as:

### from Beeline 
ERROR : FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Terminating unsuccessfully: Vertex failed, vertex_22169_0000_1_02, Some(RootInput web_sales failed on Vertex Map 1: com.datamonad.mr3.api.common.AMInputInitializerException: web_sales)Map 1            1 task           2922266 milliseconds: Failed
### from the DAGAppMaster Pod
Caused by: java.lang.RuntimeException: ORC split generation failed with exception: java.io.InterruptedIOException: Failed to open s3a://hivemr3-partitioned-2-orc/web_sales/ws_sold_date_sk=2451932/000001_0 at 14083 on s3a://hivemr3-partitioned-2-orc/web_sales/ws_sold_date_sk=2451932/000001_0: com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout waiting for connection from pool

This can also happen in ContainerWorkers, in which case ContainerWorker Pods generate such errors as:

...... com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout waiting for connection from pool
at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:202) ~[hive-exec-3.1.2.jar:3.1.2]
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:1276) ~[hive-exec-3.1.2.jar:3.1.2]
...
...... com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout waiting for connection from pool
at org.apache.hadoop.fs.s3a.S3AUtils.translateInterruptedException(S3AUtils.java:340) ~[hadoop-aws-3.1.2.jar:?]
...
Caused by: com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout waiting for connection from pool

Depending on the settings for S3 buckets and the properties of datasets, the user may have to change the values for the following configuration keys in kubernetes/conf/core-site.xml.

  • increase the value for fs.s3a.connection.maximum (e.g., to 2000 or higher)
  • increase the value for fs.s3a.threads.max
  • increase the value for fs.s3a.threads.core
  • set fs.s3a.blocking.executor.enabled to false
  • set fs.s3a.connection.ssl.enabled to false

5. A query accessing S3 makes no progress because Map vertexes get stuck in the state of Initializing.

If DAGAppMaster fails to resolve host names, the execution of a query may get stuck in the following state: hivek8s.am.stuck In such a case, check if the configuration key mr3.k8s.host.aliases is set properly in kubernetes/conf/mr3-site.xml. For example, if the user sets the environment variable HIVE_DATABASE_HOST in env.sh to the host name (instead of the address) of the MySQL server, its address should be specified in mr3.k8s.host.aliases.

HIVE_DATABASE_HOST=orange0
<property>
  <name>mr3.k8s.host.aliases</name>
  <value>orange0=11.11.11.11</value>
</property>

Internally the class AmazonS3Client (running inside InputInitializer of MR3) throws an exception java.net.UnknownHostException, which, however, is swallowed and never propagated to DAGAppMaster. As a consequence, no error is reported to Beeline and the query gets stuck.

6. DAGAppMaster Pod does not start because mr3-conf.properties does not exist.

MR3 generates a property file mr3-conf.properties from ConfigMap mr3conf-configmap-master and mounts it inside DAGAppMaster Pod. If DAGAppMaster Pod fails with the following message, it means that either ConfigMap mr3conf-configmap-master is corrupt or mr3-conf.properties has not been generated.

2020-05-15T10:35:10,255 ERROR [main] DAGAppMaster: Error in starting DAGAppMasterjava.lang.IllegalArgumentException: requirement failed: Properties file mr3-conf.properties does not exist

In such a case, try again after manually deleting ConfigMap mr3conf-configmap-master so that Hive on MR3 on Kubernetes can start without a ConfigMap of the same name.

7. Metastore fails with Version information not found in metastore

The following error occurs if the MySQL database for Metastore has not been initialized.

MetaException(message:Version information not found in metastore. )
...
Caused by: MetaException(message:Version information not found in metastore. )
  at org.apache.hadoop.hive.metastore.ObjectStore.checkSchema(ObjectStore.java:7564)
...

The user can initialize schema when starting Metastore by updating kubernetes/yaml/metastore.yaml as follows:

        args: ["start", "--init-schema"]

For more details, see Running Metastore.

8. HiveServer2 repeatedly prints an error message server.TThreadPoolServer: Error occurred during processing of message.

HiveServer2 may repeatedly print the same error message at a regular interval such as:

2020-08-13T05:28:47,351 ERROR [HiveServer2-Handler-Pool: Thread-27] server.TThreadPoolServer: Error occurred during processing of message.
java.lang.RuntimeException: org.apache.thrift.transport.TSaslTransportException: No data or no sasl data in the stream
	at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219) ~[hive-exec-3.1.2.jar:3.1.2]
	at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:269) ~[hive-exec-3.1.2.jar:3.1.2]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_252]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_252]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_252]
Caused by: org.apache.thrift.transport.TSaslTransportException: No data or no sasl data in the stream
...
2020-08-13T04:51:23,697 ERROR [HiveServer2-Handler-Pool: Thread-27] server.TThreadPoolServer: Error occurred during processing of message.
java.lang.RuntimeException: org.apache.thrift.transport.TTransportException: javax.net.ssl.SSLHandshakeException: Remote host closed connection during handshake
	at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219) ~[hive-exec-3.1.2.jar:3.1.2]
	at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:269) ~[hive-exec-3.1.2.jar:3.1.2]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_232]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_232]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_232]
Caused by: org.apache.thrift.transport.TTransportException: javax.net.ssl.SSLHandshakeException: Remote host closed connection during handshake
...
Caused by: javax.net.ssl.SSLHandshakeException: Remote host closed connection during handshake
...
Caused by: java.io.EOFException: SSL peer shut down incorrectly
...

This error message is printed when readiness and liveness probes contact HiveServer2, or when the LoadBalancer sends ping signals to HiveServer2 to check its health. Hence it does not indicate an error and can be ignored. For the user who wants to suppress such error messages, the following workaround is available.

  1. Download TThreadPoolServer.java from the Thrift source repository (https://github.com/apache/thrift/) and copy it to a new directory ql/src/java/org/apache/thrift/server/ in the source code of Hive for MR3.

    $ wget https://raw.githubusercontent.com/apache/thrift/0.9.3.1/lib/java/src/org/apache/thrift/server/TThreadPoolServer.java
    $ mkdir -p ql/src/java/org/apache/thrift/server/
    $ cp TThreadPoolServer.java ql/src/java/org/apache/thrift/server/
    
  2. Locate the line that prints the error message.

      } catch (Exception x) {
        LOGGER.error("Error occurred during processing of message.", x);
      } finally {
    

    Then modify the line to suppress the error message or the stack trace. For example, the user can suppress the stack trace as follows:

      } catch (Exception x) {
        LOGGER.error("Error occurred during processing of message: " + x.getClass().getName());
      } finally {
    
  3. Rebuild Hive for MR3 and create a new Docker image.