We assume that an external MySQL database for Metastore is running. For testing Hive on MR3, the user can create a temporary MySQL database directly on an EC2 instance. In the following example, the user logs on to an EC2 instance whose internal address is 192.168.77.14.

[ec2-user@ip-192-168-77-14 ~]$ sudo yum localinstall http://repo.mysql.com/mysql-community-release-el6-7.noarch.rpm
[ec2-user@ip-192-168-77-14 ~]$ sudo yum install mysql-community-server
[ec2-user@ip-192-168-77-14 ~]$ sudo service mysqld start
Starting mysqld (via systemctl):                           [  OK  ]

[ec2-user@ip-192-168-77-14 ~]$ mysql -h 192.168.77.14 -u root -p    # use a blank password 
mysql> GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' IDENTIFIED BY 'passwd';
mysql> FLUSH PRIVILEGES;

Now the user can connect to the temporary MySQL database with ID root and password passwd.

The user can configure Metastore and HiveServer2 in the same way as on Kubernetes in general. Below we review those configurations relevant to running Hive on MR3 on Amazon EKS.

kubernetes/env.sh

Set the following environment variables appropriately:

$ vi kubernetes/env.sh

METASTORE_USE_PERSISTENT_VOLUME=false

DOCKER_HIVE_IMG=mr3project/hive3:latest

CREATE_KEYTAB_SECRET=false
CREATE_WORKER_SECRET=false
CREATE_RANGER_SECRET=false
CREATE_ATS_SECRET=false

HIVE_DATABASE_HOST=192.168.77.14
HIVE_WAREHOUSE_DIR=s3a://mr3-tpcds-partitioned-2-orc/warehouse

METASTORE_SECURE_MODE=false

HIVE_SERVER2_HEAPSIZE=2048
HIVE_SERVER2_AUTHENTICATION=NONE

TOKEN_RENEWAL_HDFS_ENABLED=false

HIVE_METASTORE_HEAPSIZE=2048
  • Set METASTORE_USE_PERSISTENT_VOLUME to false because the PersistentVolumeClaim wordir-pvc has already been created.
  • Set METASTORE_SECURE_MODE to false to disable Kerberos authentication.
  • Set HIVE_DATABASE_HOST to the host name or address of the MySQL database.
  • Set HIVE_WAREHOUSE_DIR to the path to the data warehouse on S3. Note that the user should the file system s3a, not s3.
  • Set HIVE_SERVER2_AUTHENTICATION to NONE to disable Kerberos authentication.

kubernetes/conf/core-site.xml

$ vi kubernetes/conf/core-site.xml

<property>
  <name>hadoop.security.authentication</name>
  <value>simple</value>
</property>
  • Set the configuration key hadoop.security.authentication to simple to disable Kerberos authentication.

kubernetes/conf/mr3-site.xml

$ vi kubernetes/conf/mr3-site.xml

<property>
  <name>mr3.am.resource.memory.mb</name>
  <value>2048</value>
</property>

<property>
  <name>mr3.am.resource.cpu.cores</name>
  <value>1</value>
</property>

<property>
  <name>mr3.am.launch.cmd-opts</name>
  <value>... -Djava.security.krb5.conf=/opt/mr3-run/conf/krb5.conf -Djava.security.auth.login.config=/opt/mr3-run/conf/jgss.conf -Dsun.security.jgss.debug=true</value>
</property>

<property>
  <name>mr3.k8s.pod.worker.emptydirs</name>
  <value></value>
</property>

<property>
  <name>mr3.k8s.pod.worker.hostpaths</name>
  <value>/home/ec2-user</value>
</property>

<property>
  <name>mr3.k8s.pod.image.pull.policy</name>
  <value>IfNotPresent</value>
</property>
  • Remove -Djavax.net.ssl.trustStore=/opt/mr3-run/key/hivemr3-ssl-certificate.jks -Djavax.net.ssl.trustStoreType=jks from the values for the configuration keys mr3.am.launch.cmd-opts and mr3.container.launch.cmd-opts. When we use HDFS as the data source, setting the property javax.net.ssl.trustStore is okay whether DAGAppMaster uses SSL or not. In the case of using S3 buckets as the data source, however, DAGAppMaster internally makes secure connections to S3, so setting the property javax.net.ssl.trustStore does affect its behavior and we should unset the property javax.net.ssl.trustStore. Otherwise InputInitializers inside DAGAppMaster never return, thereby stalling queries.

    eks.query.get.stuck

  • Set the configuration key mr3.k8s.pod.worker.hostpaths to /home/ec2-user so that ContainerWorkers can write intermediate data to the home directory, which resides on the root partition and thus certainly exists on every EC2 instance. Here we assume that no additional local disks are provisioned to EC2 instances. Note that a ContainerWorker may run out of disk space for writing intermediate data depending on the size of root partition (which is 20GB by default).

kubernetes/conf/hive-site.xml

$ vi kubernetes/conf/hive-site.xml

<property>
  <name>hive.metastore.pre.event.listeners</name>
  <value></value>
</property>
<property>
  <name>metastore.pre.event.listeners</name>
  <value></value>
</property>

<property>
  <name>hive.security.authorization.manager</name>
  <value>org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizerFactory</value>
</property>

<property>
  <name>hive.mr3.map.task.memory.mb</name>
  <value>1024</value>
</property>

<property>
  <name>hive.mr3.map.task.vcores</name>
  <value>1</value>
</property>

<property>
  <name>hive.mr3.reduce.task.memory.mb</name>
  <value>1024</value>
</property>

<property>
  <name>hive.mr3.reduce.task.vcores</name>
  <value>1</value>
</property>

<property>
  <name>hive.mr3.all-in-one.containergroup.memory.mb</name>
  <value>1024</value>
</property>

<property>
  <name>hive.mr3.all-in-one.containergroup.vcores</name>
  <value>1</value>
</property>

kubernetes/yaml/hive.yaml

Check the following fields:

  • spec/template/spec/containers/image
  • spec/template/spec/containers/imagePullPolicy
  • spec/template/spec/containers/resources

kubernetes/yaml/metastore.yaml

Check the following fields:

  • spec/template/spec/containers/image
  • spec/template/spec/containers/imagePullPolicy
  • spec/template/spec/containers/resources

kubernetes/yaml/hiveserver2-service.yaml

Set the field spec/externalIPs to the public IP address of one of the the EC2 instances. It, however, has no effect on the operation of HiveServer2.