Workspace Setup

Frequently Asked Questions about Workspace

How to configure KMS encryption for EBS volume when Karpenter creates nodes in a Kubernetes cluster?

If your organization mandates the use of a Customer Managed KMS Key for EBS volume encryption, you might face permission issues while using Karpenter. Follow these steps to ensure deployment success:

Step 1: Update KMS Key Policy

  1. Go to Key Management Service: Open the AWS Management Console and navigate to the Key Management Service (KMS).

  2. Select your KMS key: Choose the relevant Customer Managed Key.

  3. Update the Key Policy:

    • Add the following policy block to the key policy.

    • Replace <WORKSPACE> with the name of your workspace.

    • Replace <ACCOUNT_ID> with the ACCOUNT ID of your AWS account.

{
  "Sid": "Allow use of the key",
  "Effect": "Allow",
  "Principal": {
    "AWS": "arn:aws:iam::<ACCOUNT_ID>:role/e6data-<WORKSPACE>-karpenter-oidc-role"
  },
  "Action": [
    "kms:Encrypt",
    "kms:Decrypt",
    "kms:ReEncrypt*",
    "kms:GenerateDataKey*",
    "kms:DescribeKey",
    "kms:CreateGrant"
  ],
  "Resource": "*"
}

Following these steps should resolve any permission issues and ensure that Karpenter can successfully create nodes with encrypted EBS volumes using your Customer Managed KMS Key.

  1. The format used for the ARN (arn:aws:iam::<ACCOUNT_ID>:role/e6data--karpenter-oidc-role) matches the OIDC role created through Terraform for Karpenter.

  2. If you have created the setup manually, you can replace the karpenter-oidc-role in the ARN with the name of the role you are using for Karpenter.

Benefits of Using Graviton Instances in Kubernetes Clusters

Q: Our organization has an SCP (Service Control Policy) that restricts the use of Graviton instances. Should we reconsider this policy?

A: Yes, it is advisable to review and potentially revise the SCP policy that restricts Graviton instances. These instances, powered by AWS's ARM-based processors, provide notable advantages in both cost and performance, which can be beneficial for your Kubernetes workloads.

Key Benefits of Graviton Instances:

  1. Cost Savings: Graviton instances typically offer lower pricing compared to x86-based instances, making them a more economical choice for running large-scale workloads.

  2. Enhanced Performance: Graviton instances often deliver superior performance, particularly in compute-intensive applications, due to their efficient processing power and memory bandwidth.

  3. Energy Efficiency: Graviton processors are designed to consume less power, leading to reduced operational costs and a smaller environmental impact.

  4. Broad Compatibility: Most modern software, including containerized applications, is compatible with the ARM architecture used by Graviton instances, making it easy to adopt and integrate into your current infrastructure.

Recommendation:

We suggest revisiting the SCP policy to permit the use of Graviton instances, at least for testing. Evaluating their performance and cost-effectiveness in your specific workloads and regions could reveal substantial benefits. Many organizations have experienced significant improvements in cost and performance by adopting Graviton instances, and your organization could benefit similarly.

Connecting to a Private Hive Metastore in a Different VPC

Q: How can we connect our e6data VPC to a private Hive Metastore located in a different VPC?

A: To connect your e6data VPC with a private Hive Metastore in a different VPC, you can establish VPC peering between the two VPCs. VPC peering allows secure and direct communication between instances in these VPCs, functioning as though they were within the same network.

Steps to Establish VPC Peering:

  1. Set Up VPC Peering: Use the Terraform scripts provided to establish a VPC peering connection between the e6data VPC and the VPC where the Hive Metastore resides. This will enable seamless communication between the resources in both VPCs.

    • For AWS: Use the Terraform scripts available here to create the VPC peering connection in AWS.

    • For GCP: Use the Terraform scripts available here for setting up network peering in GCP.

  2. Security Group and NACL Adjustments: Ensure that security groups and network access control lists (NACLs) are configured to allow traffic between the e6data VPC and the Hive Metastore VPC. This step is crucial for enabling access between services in the peered VPCs.

Additional Notes:

  • Network Configuration: Verify that the CIDR blocks of the two VPCs do not overlap, as this can lead to routing issues.

Accessing a Private Hive Metastore and Data Across Multiple Projects/Accounts

Scenario

  • e6data is deployed in one project/account.

  • A private Hive metastore is hosted in a second project/account.

  • Data is stored in a third project/account.

Solution

To securely access the Hive metastore and the data across these different projects/account, follow these steps:

  1. Establish a Secure Connection to the Hive Metastore:

    • Use VPC Peering: Set up VPC peering between the VPC where e6data is deployed and the VPC hosting the Hive metastore in the second project/account. This will ensure a secure and private connection to the Hive metastore.

    • For guidance on setting up VPC peering, refer to the relevant Terraform scripts:

      • For AWS: Use the Terraform scripts available here to create the VPC peering connection in AWS.

      • For GCP: Use the Terraform scripts available here for setting up network peering in GCP.

    • Additional Notes:

      • Network Configuration: Verify that the CIDR blocks of the two VPCs do not overlap, as this can lead to routing issues.

  2. Grant Access to Data in the Third Project:

    • Cross-Project Catalog Configuration: To access data stored in the third project/account, Follow the steps outlined in the documentation to configure roles and permissions.

How Do I Configure e6data to Access a KMS-Encrypted S3 Bucket?

Your organization uses Amazon S3 buckets encrypted with a customer-managed KMS Key to store data. You need to ensure that e6data can securely access this encrypted S3 bucket. Follow these steps to configure the necessary access.

Step 1: Update KMS Key Policy

  1. Go to Key Management Service: Open the AWS Management Console and navigate to the Key Management Service (KMS).

  2. Select your KMS key: Choose the relevant Customer Managed Key.

  3. Update the Key Policy:

    • Add the following policy block to the key policy.

{
  "Sid": "Allow use of the key",
  "Effect": "Allow",
  "Principal": {
    "AWS": "arn:aws:iam::<ACCOUNT_ID>:role/<WORKSPACE_NAME>-engine-role-<RANDOM_STRING>"
  },
  "Action": [
    "kms:Encrypt",
    "kms:Decrypt",
    "kms:ReEncrypt*",
    "kms:GenerateDataKey*",
    "kms:DescribeKey",
    "kms:CreateGrant"
  ],
  "Resource": "*"
}

Note:

  • The format used for the ARN (arn:aws:iam::<ACCOUNT_ID>:role/<WORKSPACE_NAME>-engine-role-<RANDOM_STRING>arn:aws:iam::<ACCOUNT_ID>:role/e6data-<WORKSPACE>-engine-rolekarpenter-oidc-role) matches the OIDC role created through Terraform for e6data engine.

  • If you have created the setup manually, you can replace the engine-oidc-role in the ARN with the name of the role you created earlier for engine Infrastructure & Permissions for e6data | Product Documentation.

Cost Allocation Tags Not Showing in AWS Cost Management Portal

Q. Why aren't our cost allocation tags, specified through Terraform or manual infrastructure setup, visible in the AWS Cost Management portal?

A: Even if you specify cost tags through Terraform or during manual resource creation, they may not appear in the AWS Cost Management portal if they are not activated. To make these tags visible in your cost and usage reports, you need to activate them in the AWS Billing and Cost Management console.

For more details on how to activate cost allocation tags, please refer to the AWS Documentation on Activating Tags.

Last updated