Accessing S3 when using Helm
The user can access datasets on Amazon S3 (Simple Storage Service)
by adapting the instruction at Accessing Amazon S3 for Helm.
In order to access S3, the user should take three steps.
First set the configuration key fs.s3a.aws.credentials.provider
in kubernetes/conf/core-site.xml
.
<property>
<name>fs.s3a.aws.credentials.provider</name>
<value>com.amazonaws.auth.EnvironmentVariableCredentialsProvider</value>
</property>
The class EnvironmentVariableCredentialsProvider
attempts to read AWS credentials
from two environment variables AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
.
Next set two environment variables AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
in kubernetes/helm/hive/env-secret.sh
.
export AWS_ACCESS_KEY_ID=_your_aws_access_key_id_
export AWS_SECRET_ACCESS_KEY=_your_aws_secret_secret_key_
Since kubernetes/helm/hive/env-secret.sh
is mounted as a Secret inside Metastore and HiveServer2 Pods,
it is safe to write AWS access key ID and secret access key in kubernetes/helm/hive/env-secret.sh
.
Optionally the user may use an S3 bucket as the data warehouse by updating kubernetes/helm/hive/values.yaml
.
metastore:
warehouseDir: s3a://your-warehouse-dir/warehouse # optional
Finally append AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
to the values of the configuration keys mr3.am.launch.env
and mr3.container.launch.env
in kubernetes/conf/mr3-site.xml
.
Note that for the security purpose, the user should NOT write AWS access key ID and secret access key.
Just appending the two strings suffices
because MR3 automatically sets the two environment variables by reading from the system environment.
<property>
<name>mr3.am.launch.env</name>
<value>LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HADOOP_HOME/lib/native/,AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY</value>
</property>
<property>
<name>mr3.container.launch.env</name>
<value>LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/mr3-run/hadoop/apache-hadoop/lib/native,AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY</value>
</property>