We assume that an external MySQL database for Metastore is running.
For testing Hive on MR3, the user can create a temporary MySQL database directly on an EC2 instance.
In the following example, the user logs on to an EC2 instance whose internal address is 192.168.77.14
.
[ec2-user@ip-192-168-77-14 ~]$ sudo yum localinstall http://repo.mysql.com/mysql-community-release-el6-7.noarch.rpm
[ec2-user@ip-192-168-77-14 ~]$ sudo yum install mysql-community-server
[ec2-user@ip-192-168-77-14 ~]$ sudo service mysqld start
Starting mysqld (via systemctl): [ OK ]
[ec2-user@ip-192-168-77-14 ~]$ mysql -h 192.168.77.14 -u root -p # use a blank password
mysql> GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' IDENTIFIED BY 'passwd';
mysql> FLUSH PRIVILEGES;
Now the user can connect to the temporary MySQL database with ID root
and password passwd
.
The user can configure Metastore and HiveServer2 in the same way as on Kubernetes in general. Below we review those configurations relevant to running Hive on MR3 on Amazon EKS.
kubernetes/env.sh
Set the following environment variables appropriately:
$ vi kubernetes/env.sh
METASTORE_USE_PERSISTENT_VOLUME=false
DOCKER_HIVE_IMG=mr3project/hive3:latest
CREATE_KEYTAB_SECRET=false
CREATE_WORKER_SECRET=false
CREATE_RANGER_SECRET=false
CREATE_ATS_SECRET=false
HIVE_DATABASE_HOST=192.168.77.14
HIVE_WAREHOUSE_DIR=s3a://mr3-tpcds-partitioned-2-orc/warehouse
METASTORE_SECURE_MODE=false
HIVE_SERVER2_HEAPSIZE=2048
HIVE_SERVER2_AUTHENTICATION=NONE
TOKEN_RENEWAL_HDFS_ENABLED=false
HIVE_METASTORE_HEAPSIZE=2048
- Set
METASTORE_USE_PERSISTENT_VOLUME
to false because the PersistentVolumeClaimwordir-pvc
has already been created. - Set
METASTORE_SECURE_MODE
to false to disable Kerberos authentication. - Set
HIVE_DATABASE_HOST
to the host name or address of the MySQL database. - Set
HIVE_WAREHOUSE_DIR
to the path to the data warehouse on S3. Note that the user should the file systems3a
, nots3
. - Set
HIVE_SERVER2_AUTHENTICATION
toNONE
to disable Kerberos authentication.
kubernetes/conf/core-site.xml
$ vi kubernetes/conf/core-site.xml
<property>
<name>hadoop.security.authentication</name>
<value>simple</value>
</property>
- Set the configuration key
hadoop.security.authentication
tosimple
to disable Kerberos authentication.
kubernetes/conf/mr3-site.xml
$ vi kubernetes/conf/mr3-site.xml
<property>
<name>mr3.am.resource.memory.mb</name>
<value>2048</value>
</property>
<property>
<name>mr3.am.resource.cpu.cores</name>
<value>1</value>
</property>
<property>
<name>mr3.am.launch.cmd-opts</name>
<value>... -Djava.security.krb5.conf=/opt/mr3-run/conf/krb5.conf -Djava.security.auth.login.config=/opt/mr3-run/conf/jgss.conf -Dsun.security.jgss.debug=true</value>
</property>
<property>
<name>mr3.k8s.pod.worker.emptydirs</name>
<value></value>
</property>
<property>
<name>mr3.k8s.pod.worker.hostpaths</name>
<value>/home/ec2-user</value>
</property>
<property>
<name>mr3.k8s.pod.image.pull.policy</name>
<value>IfNotPresent</value>
</property>
-
Remove
-Djavax.net.ssl.trustStore=/opt/mr3-run/key/hivemr3-ssl-certificate.jks -Djavax.net.ssl.trustStoreType=jks
from the values for the configuration keysmr3.am.launch.cmd-opts
andmr3.container.launch.cmd-opts
. When we use HDFS as the data source, setting the propertyjavax.net.ssl.trustStore
is okay whether DAGAppMaster uses SSL or not. In the case of using S3 buckets as the data source, however, DAGAppMaster internally makes secure connections to S3, so setting the propertyjavax.net.ssl.trustStore
does affect its behavior and we should unset the propertyjavax.net.ssl.trustStore
. Otherwise InputInitializers inside DAGAppMaster never return, thereby stalling queries. -
Set the configuration key
mr3.k8s.pod.worker.hostpaths
to/home/ec2-user
so that ContainerWorkers can write intermediate data to the home directory, which resides on the root partition and thus certainly exists on every EC2 instance. Here we assume that no additional local disks are provisioned to EC2 instances. Note that a ContainerWorker may run out of disk space for writing intermediate data depending on the size of root partition (which is 20GB by default).
kubernetes/conf/hive-site.xml
$ vi kubernetes/conf/hive-site.xml
<property>
<name>hive.metastore.pre.event.listeners</name>
<value></value>
</property>
<property>
<name>metastore.pre.event.listeners</name>
<value></value>
</property>
<property>
<name>hive.security.authorization.manager</name>
<value>org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizerFactory</value>
</property>
<property>
<name>hive.mr3.map.task.memory.mb</name>
<value>1024</value>
</property>
<property>
<name>hive.mr3.map.task.vcores</name>
<value>1</value>
</property>
<property>
<name>hive.mr3.reduce.task.memory.mb</name>
<value>1024</value>
</property>
<property>
<name>hive.mr3.reduce.task.vcores</name>
<value>1</value>
</property>
<property>
<name>hive.mr3.all-in-one.containergroup.memory.mb</name>
<value>1024</value>
</property>
<property>
<name>hive.mr3.all-in-one.containergroup.vcores</name>
<value>1</value>
</property>
kubernetes/yaml/hive.yaml
Check the following fields:
spec/template/spec/containers/image
spec/template/spec/containers/imagePullPolicy
spec/template/spec/containers/resources
kubernetes/yaml/metastore.yaml
Check the following fields:
spec/template/spec/containers/image
spec/template/spec/containers/imagePullPolicy
spec/template/spec/containers/resources
kubernetes/yaml/hiveserver2-service.yaml
Set the field spec/externalIPs
to the public IP address of one of the the EC2 instances.
It, however, has no effect on the operation of HiveServer2.