Setting YAML files
Below we provide details about YAML files in the yaml directory.
These files should be updated to reflect the settings in env.sh
and the specific configuration of the Kubernetes cluster.
Since YAML files do not read environment variables, all necessary updates must be made manually.
Common to all Pods
namespace.yaml
This manifest defines a namespace for all Kubernetes objects.
The name field should match the namespace specified in MR3_NAMESPACE in kubernetes/env.sh.
vi yaml/namespace.yaml
name: hivemr3
Similarly the namespace field in the other YAML files should match the same namespace.
hive-service-account.yaml
This manifest defines a ServiceAccount.
The name of the ServiceAccount object (hive-service-account) is read in run-hive.sh,
so there is no need to update this file.
cluster-role.yaml
This manifest defines a ClusterRole.
The name of the ClusterRole resource (node-reader) is read in run-hive.sh,
so there is no need to update this file.
metastore-role.yaml
This manifest defines a Role for a Metastore Pod.
The name of the Role resource (metastore-role) is read in run-metastore.sh,
so there is no need to update this file.
hive-role.yaml
This manifest defines a Role for HiveServer2 Pods.
The name of the Role resource (hive-role) is read in run-hive.sh,
so there is no need to update this file.
workdir-pv.yaml
This manifest defines a PersistentVolume for copying the result of running a query from ContainerWorkers to HiveServer2. The user should update it in order to use a desired type of PersistentVolume.
workdir-pvc.yaml
This manifest defines a PersistentVolumeClaim which references the PersistentVolume created by workdir-pv.yaml.
The user should specify the size of the storage.
vi yaml/workdir-pvc.yaml
storage: 10Gi
Configuring Metastore
metastore-service.yaml
This manifest defines a governing Service required for the StatefulSet for Metastore.
The user should use the same port number specified by the environment variable HIVE_METASTORE_PORT in env.sh.
vi yaml/metastore-service.yaml
ports:
- name: tcp
port: 9850
metastore.yaml
This manifest defines a Pod for running Metastore. The user should update several sections in this file according to Kubernetes cluster settings.
In the spec.template.spec.containers section:
- The
imagefield should match the Docker image specified byDOCKER_HIVE_IMGinenv.sh. - The
resources.requestsandresources.limitsspecify the resources to to be allocated to a Metastore Pod. - The
ports.containerPortfield should match the port number specified inmetastore-service.yaml.
vi yaml/metastore.yaml
spec:
template:
spec:
containers:
- image: mr3project/hive:4.0.0.mr3.2.2
resources:
requests:
cpu: 2
memory: 16Gi
limits:
cpu: 2
memory: 16Gi
ports:
- containerPort: 9850
protocol: TCP
In the spec.template.spec.volumes section:
- The
configMap.namefield underconf-k8s-volumeshould match the name specified byCONF_DIR_CONFIGMAPinenv.sh. - The
secret.secretNamefield underkey-k8s-volumeshould match the name specified byKEYTAB_SECRETinenv.sh.
vi yaml/metastore.yaml
spec:
template:
spec:
volumes:
- name: conf-k8s-volume
configMap:
name: hivemr3-conf-configmap
- name: key-k8s-volume
secret:
secretName: hivemr3-keytab-secret
In the spec.template.spec.hostAliases section:
HIVE_DATABASE_HOSTinenv.shspecifies the host where the database for Metastore is running. If it uses a host unknown to the default DNS, the user should add its alias. The following example adds host namesred0andindigo20that are unknown to the default DNS.
vi yaml/metastore.yaml
spec:
template:
spec:
hostAliases:
- ip: "10.1.91.4"
hostnames:
- "red0"
- ip: "10.1.91.41"
hostnames:
- "indigo20"
Configuring HiveServer2
hiveserver2-service.yaml
This manifest defines a Service for exposing HiveServer2 to the outside of the Kubernetes cluster.
The user should specify a public IP address with a valid host name and a port number
(with name thrift) for HiveServer2
so that clients can connect to it from the outside of the Kubernetes cluster.
Another port number with name http should be specified if HTTP transport is enabled
(by setting the configuration key hive.server2.transport.mode to all or http in conf/hive-site.xml).
The host name is necessary in order for Ranger to securely communicate with HiveServer2.
vi yaml/hiveserver2-service.yaml
ports:
- protocol: TCP
port: 9852
targetPort: 9852
name: thrift
- protocol: TCP
port: 10001
targetPort: 10001
name: http
externalIPs:
- 10.1.91.41
In our example, we use 10.1.91.41:9852 as the full address of HiveServer2. The user should make sure that the IP address exists with a valid host name and is not already taken.
hive.yaml
This manifest defines a Pod for running HiveServer2 (by creating a Deployment). The user should update several sections in this file according to Kubernetes cluster settings.
In the spec.template.spec.containers section:
- The
imagefield should match the Docker image specified byDOCKER_HIVE_IMGinenv.sh. - The
argsfield specifies the DAGAppMaster mode:--localthreadfor LocalThread mode,--localprocessfor LocalProcess mode, and--kubernetesfor Kubernetes mode. - The
resources.requestsandresources.limitsfields specify the resources to to be allocated to a HiveServer2 Pod. - The three fields
ports.containerPort,readinessProbe.tcpSocket.port, andlivenessProbe.tcpSocket.portshould match the port number specified inhiveserver2-service.yaml.
$ vi yaml/hive.yaml
spec:
template:
spec:
containers:
- image: mr3project/hive:4.0.0.mr3.2.2
args: ["start", "--kubernetes"]
resources:
requests:
cpu: 4
memory: 32Gi
limits:
cpu: 4
memory: 32Gi
ports:
- containerPort: 9852
readinessProbe:
tcpSocket:
port: 9852
livenessProbe:
tcpSocket:
port: 9852
In the spec.template.spec.volumes section:
- The
configMap.namefield underconf-k8s-volumeshould match the name specified byCONF_DIR_CONFIGMAPinenv.sh. - The
secret.secretNamefield underkey-k8s-volumeshould match the name specified byKEYTAB_SECRETinenv.sh.
vi yaml/hive.yaml
spec:
template:
spec:
volumes:
- name: conf-k8s-volume
configMap:
name: hivemr3-conf-configmap
- name: key-k8s-volume
secret:
secretName: hivemr3-keytab-secret
The spec.template.spec.hostAliases field can list aliases for hosts that may not be found in the default DNS.
For example, the host running Metastore may be unknown to the default DNS,
in which case the user can add an alias for it.