Hive on MR3 supports four different ways to access S3 buckets within an EKS cluster.
- Use environment variables
AWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
- Update IAM (Identity and Access Management) roles for node groups in the EKS cluster
- Use IAM roles for ServiceAccounts
- Use IAM roles for ServiceAccounts created by
eksctl
(e.g., on EKS)
Accessing S3 buckets with environment variables proceeds in the same way whether from the inside or from the outside of AWS, so the user can follow the instruction in Accessing Amazon S3. The remaining ways rely on IAM roles to manage access to S3.
2. Update IAM roles for node groups in the EKS cluster
If an IAM policy for accessing S3 buckets is available before creating an EKS cluster,
the user can include its ARN in the the iam/attachPolicyARNs
field of
node groups mr3-master
and mr3-worker
in kubernetes/eks/cluster.yaml
.
Then every Pod is allowed to access S3 buckets.
If an EKS cluster is created without using an IAM policy for accessing S3 buckets,
find the IAM roles for the mr3-master
and mr3-worker
node groups (which typically look like
eksctl-hive-mr3-nodegroup-mr3-mas-NodeInstanceRole-448MRIYIQ3F8
and eksctl-hive-mr3-nodegroup-mr3-wor-NodeInstanceRole-E19NHT8X0UJ7
).
For both IAM roles, add the following inline policy or its variant so that every Pod can access the target S3 bucket.
Adjust the Action
field to restrict the set of operations permitted to Pods.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:*"
],
"Resource": [
"arn:aws:s3:::mr3-tpcds-partitioned-2-orc",
"arn:aws:s3:::mr3-tpcds-partitioned-2-orc/*"
]
}
]
}
Now all Pods can access the target S3 bucket.
Depending on the ownership of the target S3 bucket, the user may also have to create a bucket policy. If the target S3 bucket is owned by the same user creating the EKS cluster, a bucket policy is unnecessary.
3. Use IAM roles for ServiceAccounts
By default, Hive on MR3 creates three ServiceAccounts
specified by hive-service-account.yaml
, master-service-account.yaml
, and worker-service-account.yaml
in the directory kubernetes/yaml
.
- ServiceAccount
hive-service-account
for Metastore and HiveServer2 Pods - ServiceAccount
master-service-account
for DAGAppMaster Pod - ServiceAccount
worker-service-account
for ContainerWorker Pods
If the EKS cluster has enabled IAM roles for ServiceAccounts, the user can create an IAM role with a policy for accessing the target S3 bucket and associate it with these ServiceAccounts. Then every Pod can access the target S3 bucket.
- Enable IAM roles for ServiceAccounts by creating an OIDC identity provider.
For more information, see AWS User Guide.
$ eksctl utils associate-iam-oidc-provider --cluster hive-mr3 --approve [ℹ] eksctl version 0.27.0 [ℹ] using region ap-northeast-1 [ℹ] will create IAM Open ID Connect provider for cluster "hive-mr3" in "ap-northeast-1" [✔] created IAM Open ID Connect provider for cluster "hive-mr3" in "ap-northeast-1"
- Create an IAM role with a policy for accessing the target S3 bucket.
The user may follow the instruction in AWS User Guide,
but do not manually create a new ServiceAccount using
eksctl
because Hive on MR3 creates ServiceAccounts. - Associate the IAM role with ServiceAccounts by adding an annotation.
The following example shows how to add an annotation in
hive-service-account.yaml
whereNEW_IAM_ROLE_NAME
is the name of the IAM role created in the previous step.$ vi kubernetes/yaml/hive-service-account.yaml apiVersion: v1 kind: ServiceAccount metadata: namespace: hivemr3 name: hive-service-account annotations: eks.amazonaws.com/role-arn: arn:aws:iam::111111111111:role/NEW_IAM_ROLE_NAME
- Set the configuration key
fs.s3a.aws.credentials.provider
tocom.amazonaws.auth.InstanceProfileCredentialsProvider
inkubernetes/conf/core-site.xml
.$ vi kubernetes/conf/core-site.xml <property> <name>fs.s3a.aws.credentials.provider</name> <value>com.amazonaws.auth.InstanceProfileCredentialsProvider</value> </property>
- If necessary (on Kubernetes 1.18 and earlier), rebuild the Docker image so that all containers run as user
root
.
4. Use IAM roles for ServiceAccounts created by eksctl
Alternatively the user can create ServiceAccounts with eksctl
and use WebIdentityTokenCredentialsProvider
instead of InstanceProfileCredentialsProvider
.
On EKS, we recommend the use of WebIdentityTokenCredentialsProvider
.
- Set the environment variable
CREATE_SERVICE_ACCOUNTS
to false inkubernetes/env.sh
(because we will create ServiceAccounts witheksctl
later). When using Helm, set the fieldcreate/serviceAccount
to false invalues.yaml
.$ vi kubernetes/env.sh CREATE_SERVICE_ACCOUNTS=false
- Set the environment variable
AWS_REGION
to a string representing the AWS region inkubernetes/env.sh
.$ export AWS_REGION=ap-northeast-1 # to be able to execute 'eksctl' without '--region' $ vi kubernetes/env.sh export AWS_REGION=ap-northeast-1
Append
AWS_REGION
the values of the configuration keysmr3.am.launch.env
andmr3.container.launch.env
inkubernetes/conf/mr3-site.xml
.$ vi kubernetes/conf/mr3-site.xml <property> <name>mr3.am.launch.env</name> <value>LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HADOOP_HOME/lib/native/,HADOOP_CREDSTORE_PASSWORD,AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY,AWS_REGION,USE_JAVA_17</value> </property> <property> <name>mr3.container.launch.env</name> <value>LD_LIBRARY_PATH=/opt/mr3-run/hadoop/apache-hadoop/lib/native,HADOOP_CREDSTORE_PASSWORD,AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY,AWS_REGION,USE_JAVA_17</value> </property>
Without the environment variable
AWS_REGION
set appropriately,WebIdentityTokenCredentialsProvider
fails with the following error:WARNING: Unable to retrieve the requested metadata (/latest/dynamic/instance-identity/document). Failed to connect to service endpoint: com.amazonaws.SdkClientException: Failed to connect to service endpoint: at com.amazonaws.internal.EC2ResourceFetcher.doReadResource(EC2ResourceFetcher.java:100) ... at com.amazonaws.util.EC2MetadataUtils.getEC2InstanceRegion(EC2MetadataUtils.java:282) ... at com.amazonaws.auth.WebIdentityTokenCredentialsProvider.getCredentials(WebIdentityTokenCredentialsProvider.java:76) at org.apache.hadoop.fs.s3a.AWSCredentialProviderList.getCredentials(AWSCredentialProviderList.java:117)
- Enable IAM roles for ServiceAccounts by creating an OIDC identity provider.
$ eksctl utils associate-iam-oidc-provider --cluster hive-mr3 --approve
- Create an IAM role with a policy for accessing the target S3 bucket.
- Create ServiceAccounts with
eksctl
using the IAM role (e.g.,arn:aws:iam::111111111111:policy/s3
).$ eksctl create iamserviceaccount --name hive-service-account --namespace hivemr3 --cluster hive-mr3 --attach-policy-arn arn:aws:iam::111111111111:policy/s3 --approve --override-existing-serviceaccounts [ℹ] eksctl version 0.27.0 [ℹ] using region ap-northeast-1 [ℹ] 1 iamserviceaccount (hivemr3/hive-service-account) was included (based on the include/exclude rules) [!] metadata of serviceaccounts that exist in Kubernetes will be updated, as --override-existing-serviceaccounts was set [ℹ] 1 task: { 2 sequential sub-tasks: { create IAM role for serviceaccount "hivemr3/hive-service-account", create serviceaccount "hivemr3/hive-service-account" } } [ℹ] building iamserviceaccount stack "eksctl-hive-mr3-addon-iamserviceaccount-hivemr3-hive-service-account" [ℹ] deploying stack "eksctl-hive-mr3-addon-iamserviceaccount-hivemr3-hive-service-account" [ℹ] created namespace "hivemr3" [ℹ] created serviceaccount "hivemr3/hive-service-account" $ eksctl create iamserviceaccount --name master-service-account --namespace hivemr3 --cluster hive-mr3 --attach-policy-arn arn:aws:iam::111111111111:policy/s3 --approve --override-existing-serviceaccounts $ eksctl create iamserviceaccount --name worker-service-account --namespace hivemr3 --cluster hive-mr3 --attach-policy-arn arn:aws:iam::111111111111:policy/s3 --approve --override-existing-serviceaccounts $ eksctl get iamserviceaccount --namespace hivemr3 --cluster hive-mr3 NAMESPACE NAME ROLE ARN hivemr3 hive-service-account arn:aws:iam::111111111111:role/eksctl-hive-mr3-addon-iamserviceaccount-hive-Role1-RERICJ8FK7AM hivemr3 master-service-account arn:aws:iam::111111111111:role/eksctl-hive-mr3-addon-iamserviceaccount-hive-Role1-Z3SPAHKYB1UI hivemr3 worker-service-account arn:aws:iam::111111111111:role/eksctl-hive-mr3-addon-iamserviceaccount-hive-Role1-18BQ9YHYM8JTV
- Set the configuration key
fs.s3a.aws.credentials.provider
tocom.amazonaws.auth.WebIdentityTokenCredentialsProvider
inkubernetes/conf/core-site.xml
.$ vi kubernetes/conf/core-site.xml <property> <name>fs.s3a.aws.credentials.provider</name> <value>com.amazonaws.auth.WebIdentityTokenCredentialsProvider</value> </property>
- If necessary (on Kubernetes 1.18 and earlier), rebuild the Docker image so that all containers run as user
root
.