Prerequisite Infrastructure
The following components are required prior to setting up the infrastructure needed by e6data. These are commonly present in most cloud environments, but if any are not present, please follow the linked guides below to create them.
Amazon Elastic Kubernetes Service (EKS) cluster
To provide secure connectivity between e6data clusters and data buckets within your AWS account.
To scale the infrastructure for e6data clusters.
Set up AWS Load Balancer Controller
For connectivity between e6data Console & e6data clusters
To allow connectivity between 3rd party tools & e6data clusters.
Create a VPC, Subnets & other VPC Resources
Open the Amazon VPC console at https://console.aws.amazon.com/vpc/
On the VPC dashboard, choose Create VPC.
For Resources to create, choose VPC and more.
Keep Name tag auto-generation selected to create Name tags for the VPC resources or clear it to provide your own Name tags for the VPC resources.
For IPv4 CIDR block, enter an IPv4 address range for the VPC. A VPC must have an IPv4 address range.
(Optional) To support IPv6 traffic, choose IPv6 CIDR block, Amazon-provided IPv6 CIDR block.
Choose a Tenancy option to determine whether EC2 instances in the VPC run on shared hardware or hardware dedicated to your use.
Default tenancy allows instances to use the tenancy specified at launch, while Dedicated tenancy ensures all instances run on dedicated hardware. Learn more about Dedicated Instances.
AWS Outposts require Default tenancy due to private connectivity. For more details, refer to Launch an instance using defined parameters in the Amazon EC2 User Guide for Linux Instances.
For Number of Availability Zones (AZs), it's recommended to provision subnets in at least two AZs for production. To select specific AZs, expand Customize AZs; otherwise, AWS will choose them automatically.
To configure your subnets, choose values for Number of public subnets and Number of private subnets. To choose the IP address ranges for your subnets, expand Customize subnets CIDR blocks. Otherwise, let AWS choose them for you.
A NAT Gateway is required to export logs and metrics from the private subnet where e6data resources are deployed.
Choose the number of Availability Zones (AZs) in which to create NAT gateways.
For production environments, it's recommended to deploy a NAT Gateway in each AZ that contains resources needing access to the public internet.
⚠️ Note: NAT Gateways incur additional cost. For more information, see the NAT Gateway Pricing page.
(Optional) If you need to access Amazon S3 directly from your VPC, choose VPC endpoints, S3 Gateway. This creates a gateway VPC endpoint for Amazon S3. For more information, see Gateway VPC endpoints in the AWS PrivateLink Guide.
(Optional) For DNS options, both options for domain name resolution are enabled by default. If the default doesn't meet your needs, you can disable these options.
(Optional) To add a tag to your VPC, expand Additional tags, choose Add new tag, and enter a tag key and a tag value.
When using a separate VPC for e6data, adding a tag, e.g.:
app=e6data
is recommended to help monitor usage & costs.
In the Preview pane, you can visualize relationships between the VPC resources you've configured.
Solid lines show resource relationships; dotted lines indicate network traffic paths to NAT gateways, internet gateways, and gateway endpoints.
After creating the VPC, you can view this diagram anytime via the Resource map tab.
For more details, see Visualize the resources in your VPC.
When you are finished configuring your VPC, choose Create VPC.
Please make note of the VPC Region, it will be required when creating the Workspace in the e6data Console.
Create EKS Cluster & Default Node Group
Create EKS Cluster
To get started with setting up an Amazon Elastic Kubernetes Service (EKS) cluster, please follow the comprehensive documentation provided by AWS: Creating an Amazon EKS cluster.
Please make note of the EKS Cluster Name, it will be required when creating the Workspace in the e6data Console.
Enable OpenID Connect (OIDC)
Enabling IAM roles for service accounts and creating an OIDC (OpenID Connect) provider for your EKS cluster is crucial in this context because it directly relates to providing secure access for e6data clusters to interact with data buckets within your AWS account.
e6data uses OIDC for more secure access as it provides least privilege, credential isolation & auditability.
To enable IAM roles for service accounts and create an OIDC (OpenID Connect) provider for your EKS cluster, please refer to the documentation Creating an IAM OIDC provider for your cluster - Amazon EKS.
Set up Karpenter
To set up the Karpenter for your Amazon Elastic Kubernetes Service (EKS) cluster, refer to the official Karpenter documentation available on Kapenter documentation.
Use a Restricted IAM Policy: Instead of the broad permissions outlined in the Karpenter documentation for the Karpenter controller OIDC policy, you can use a more restricted IAM policy. Here is an example policy that limits permissions to only e6data-managed resources.
Karpenter has two main components:
EC2 NodeClass
Karpenter NodeClasses serve as customized blueprints for your AWS worker nodes, tailored to your cloud provider's specifications, such as EC2NodeClasses for AWS. These classes define crucial details including the AMI family (OS), security groups, subnets, and IAM roles.
NodePool
A single Karpenter NodePool handles various pods, eliminating the need for multiple node groups. Use the command below to create a NodePool with securityGroupSelectorTerms and subnetSelectorTerms for resource discovery. The consolidation policy set to WhenEmpty reduces costs by removing empty nodes.
Set up AWS Load Balancer Controller (ALB)
An AWS Load Balancer (ALB) is required in the EKS Cluster for connectivity between the e6data Console and e6data Cluster, as well as for providing connectivity between querying/BI tools and the e6data Query Engine.
To install the AWS Load Balancer Controller (ALB) in your Amazon Elastic Kubernetes Service (EKS) cluster, follow the steps outlined in the official AWS documentation at Installing the AWS Load Balancer Controller add-on - Amazon EKS.
Use a Restricted IAM Policy: Instead of using the broad permissions mentioned in the ALB Controller documentation, you can use a more restricted IAM policy. Here is an example policy that you can use to limit permissions to only e6data-managed resources.
Last updated