Workspace Setup

Frequently Asked Questions about Workspace

How to configure KMS encryption for EBS volume when Karpenter creates nodes in a Kubernetes cluster?

If your organization mandates the use of a Customer Managed KMS Key for EBS volume encryption, you might face permission issues while using Karpenter. Follow these steps to ensure deployment success:

Step 1: Update KMS Key Policy

  1. Go to Key Management Service: Open the AWS Management Console and navigate to the Key Management Service (KMS).

  2. Select your KMS key: Choose the relevant Customer Managed Key.

  3. Update the Key Policy:

    • Add the following policy block to the key policy.

    • Replace <WORKSPACE> with the name of your workspace.

    • Replace <ACCOUNT_ID> with the ACCOUNT ID of your AWS account.

{
  "Sid": "Allow use of the key",
  "Effect": "Allow",
  "Principal": {
    "AWS": "arn:aws:iam::<ACCOUNT_ID>:role/e6data-<WORKSPACE>-karpenter-oidc-role"
  },
  "Action": [
    "kms:Encrypt",
    "kms:Decrypt",
    "kms:ReEncrypt*",
    "kms:GenerateDataKey*",
    "kms:DescribeKey",
    "kms:CreateGrant"
  ],
  "Resource": "*"
}

Following these steps should resolve any permission issues and ensure that Karpenter can successfully create nodes with encrypted EBS volumes using your Customer Managed KMS Key.

  1. The format used for the ARN (arn:aws:iam::<ACCOUNT_ID>:role/e6data--karpenter-oidc-role) matches the OIDC role created through Terraform for Karpenter.

  2. If you have created the setup manually, you can replace the karpenter-oidc-role in the ARN with the name of the role you are using for Karpenter.

Benefits of Using Graviton Instances in Kubernetes Clusters

Q: Our organization has an SCP (Service Control Policy) that restricts the use of Graviton instances. Should we reconsider this policy?

A: Yes, it is advisable to review and potentially revise the SCP policy that restricts Graviton instances. These instances, powered by AWS's ARM-based processors, provide notable advantages in both cost and performance, which can be beneficial for your Kubernetes workloads.

Key Benefits of Graviton Instances:

  1. Cost Savings: Graviton instances typically offer lower pricing compared to x86-based instances, making them a more economical choice for running large-scale workloads.

  2. Enhanced Performance: Graviton instances often deliver superior performance, particularly in compute-intensive applications, due to their efficient processing power and memory bandwidth.

  3. Energy Efficiency: Graviton processors are designed to consume less power, leading to reduced operational costs and a smaller environmental impact.

  4. Broad Compatibility: Most modern software, including containerized applications, is compatible with the ARM architecture used by Graviton instances, making it easy to adopt and integrate into your current infrastructure.

Recommendation:

We suggest revisiting the SCP policy to permit the use of Graviton instances, at least for testing. Evaluating their performance and cost-effectiveness in your specific workloads and regions could reveal substantial benefits. Many organizations have experienced significant improvements in cost and performance by adopting Graviton instances, and your organization could benefit similarly.

Connecting to a Private Hive Metastore in a Different VPC

Q: How can we connect our e6data VPC to a private Hive Metastore located in a different VPC?

A: To connect your e6data VPC with a private Hive Metastore in a different VPC, you can establish VPC peering between the two VPCs. VPC peering allows secure and direct communication between instances in these VPCs, functioning as though they were within the same network.

Steps to Establish VPC Peering:

  1. Set Up VPC Peering: Use the Terraform scripts provided to establish a VPC peering connection between the e6data VPC and the VPC where the Hive Metastore resides. This will enable seamless communication between the resources in both VPCs.

    • For AWS: Use the Terraform scripts available here to create the VPC peering connection in AWS.

    • For GCP: Use the Terraform scripts available here for setting up network peering in GCP.

  2. Security Group and NACL Adjustments: Ensure that security groups and network access control lists (NACLs) are configured to allow traffic between the e6data VPC and the Hive Metastore VPC. This step is crucial for enabling access between services in the peered VPCs.

Additional Notes:

  • Network Configuration: Verify that the CIDR blocks of the two VPCs do not overlap, as this can lead to routing issues.

Accessing a Private Hive Metastore and Data Across Multiple Projects/Accounts

Scenario

  • e6data is deployed in one project/account.

  • A private Hive metastore is hosted in a second project/account.

  • Data is stored in a third project/account.

Solution

To securely access the Hive metastore and the data across these different projects/account, follow these steps:

  1. Establish a Secure Connection to the Hive Metastore:

    • Use VPC Peering: Set up VPC peering between the VPC where e6data is deployed and the VPC hosting the Hive metastore in the second project/account. This will ensure a secure and private connection to the Hive metastore.

    • For guidance on setting up VPC peering, refer to the relevant Terraform scripts:

      • For AWS: Use the Terraform scripts available here to create the VPC peering connection in AWS.

      • For GCP: Use the Terraform scripts available here for setting up network peering in GCP.

    • Additional Notes:

      • Network Configuration: Verify that the CIDR blocks of the two VPCs do not overlap, as this can lead to routing issues.

  2. Grant Access to Data in the Third Project:

    • Cross-Project Catalog Configuration: To access data stored in the third project/account, Follow the steps outlined in the documentation to configure roles and permissions.

How Do I Configure e6data to Access a KMS-Encrypted S3 Bucket?

Your organization uses Amazon S3 buckets encrypted with a customer-managed KMS Key to store data. You need to ensure that e6data can securely access this encrypted S3 bucket. Follow these steps to configure the necessary access.

Step 1: Update KMS Key Policy

  1. Go to Key Management Service: Open the AWS Management Console and navigate to the Key Management Service (KMS).

  2. Select your KMS key: Choose the relevant Customer Managed Key.

  3. Update the Key Policy:

    • Add the following policy block to the key policy.

{
  "Sid": "Allow use of the key",
  "Effect": "Allow",
  "Principal": {
    "AWS": "arn:aws:iam::<ACCOUNT_ID>:role/<WORKSPACE_NAME>-engine-role-<RANDOM_STRING>"
  },
  "Action": [
    "kms:Encrypt",
    "kms:Decrypt",
    "kms:ReEncrypt*",
    "kms:GenerateDataKey*",
    "kms:DescribeKey",
    "kms:CreateGrant"
  ],
  "Resource": "*"
}

Note:

  • The format used for the ARN (arn:aws:iam::<ACCOUNT_ID>:role/<WORKSPACE_NAME>-engine-role-<RANDOM_STRING>arn:aws:iam::<ACCOUNT_ID>:role/e6data-<WORKSPACE>-engine-rolekarpenter-oidc-role) matches the OIDC role created through Terraform for e6data engine.

  • If you have created the setup manually, you can replace the engine-oidc-role in the ARN with the name of the role you created earlier for engine Infrastructure & Permissions for e6data | Product Documentation.

Cost Allocation Tags Not Showing in AWS Cost Management Portal

Q. Why aren't our cost allocation tags, specified through Terraform or manual infrastructure setup, visible in the AWS Cost Management portal?

A: Even if you specify cost tags through Terraform or during manual resource creation, they may not appear in the AWS Cost Management portal if they are not activated. To make these tags visible in your cost and usage reports, you need to activate them in the AWS Billing and Cost Management console.

For more details on how to activate cost allocation tags, please refer to the AWS Documentation on Activating Tags.

Should we allow the creation of an Internet Gateway (IGW) in our organization if we currently have a Service Control Policy (SCP) that restricts this action for existing VPCs?

An Internet Gateway (IGW) is crucial for enabling communication between an AWS Virtual Private Cloud (VPC) and the public internet. It allows resources in public subnets to send and receive traffic, provided they have public IP addresses. If the ability to create or attach an IGW is denied, it will restrict connectivity, preventing instances from accessing external services or being reachable from the internet. This would also hinder e6data control plane connectivity. Therefore, maintaining the IGW is essential for ensuring operational flexibility and connectivity.

What steps should be taken to ensure that e6data infrastructure resources across AWS, Azure, and GCP operate without restrictions, and how should existing policies, including third-party restrictions, be reviewed to avoid operational impediments?

To ensure that e6data infrastructure resources across AWS, Azure, and Google Cloud Platform (GCP) operate without restrictions, it is essential to verify that no relevant policies hinder their functionality.

For AWS, confirm that no Service Control Policies (SCPs) affect key services such as EKS, EC2, S3, IAM, Subnet, VPC, NAT Gateway, Internet Gateway, VPC Endpoint, Security Group, SQS, CloudWatch, WAF, and ELB. SCPs act as permission guardrails within AWS Organizations, controlling the maximum permissions for IAM users and roles in member accounts. Since SCPs do not grant permissions themselves, review existing SCPs to ensure they do not impose limitations on these essential services that could impede operations or resource utilization.

For Azure, it is crucial to check for any Azure policies that might restrict services like AKS (Azure Kubernetes Service), Key Vault, Storage Account, Managed Identities, NAT Gateway, Virtual Network, Public IP Addresses, and Load Balancing.

For GCP, ensure that no Organization Policies limit the operations of services such as Kubernetes Engine (GKE), Key Management, Cloud Storage, IAM & Admin, Service Accounts, Cloud NAT, VPC Network, IP Addresses, and Load Balancing.

Finally, review any third-party restriction policies, such as custodian policies, to ensure they do not impose additional constraints that could hinder operational needs or resource utilization.

What are the necessary NACL rules for e6Data to function effectively?

The specified inbound rules are essential for E6Data to function smoothly and without interruption. The rules include allowing HTTPS traffic on TCP protocol through port 443 from any source (0.0.0.0/0), ensuring secure communication for data transmissions. Additionally, HTTP traffic is permitted on TCP protocol through port 80, also from any source, facilitating standard web access. A custom TCP rule allows traffic across a wide range of ports, from 1001 to 65535, enabling various application-specific requests critical for dynamic data operations. Furthermore, all traffic is allowed from the internal network range of the existing VPC, while denying all other external traffic (0.0.0.0/0).

What steps should one take to ensure they use the appropriate instance types for their workload while maximizing cost savings?

You should check if you have reserved instance types that belong to the families we've requested. If you don’t have those reserved, please use on-demand instances instead. Additionally, if you do have reserved instance types for the requested families, the system will automatically apply those reservations, enabling cost savings associated with reserved capacity. This approach ensures flexibility while maximizing the benefits of existing reservations.

What steps to follow to install the AWS Load Balancer Controller (ALB Controller) in the existing Amazon EKS setup, and what IAM policy permissions need to be added?

To install the AWS Load Balancer Controller (ALB Controller) in an existing Amazon EKS setup, follow the steps outlined in the AWS Load Balancer Controller Installation Guide. Ensure that the necessary IAM policy with the required permissions is added as specified here. This will facilitate the successful configuration and deployment of the ALB Controller in your EKS cluster.

What are the steps to install Terraform on Windows, and how can a Terraform script be configured to use PowerShell as the interpreter?

To install Terraform on Windows, you can follow the detailed instructions in the documentation available at E6Data's official site. This guide will walk you through the necessary steps to get Terraform up and running on your local machine. When executing a Terraform script from Windows, it is important to update the interpreter to PowerShell. Here’s an example of how you can specify the interpreter in your Terraform configuration:

resource "null_resource" "waiting" {
  provisioner "local-exec" {
    interpreter = ["PowerShell", "-Command"]
    command = "Start-Sleep -Seconds 30"
  }
}

Last updated