Prerequisite Infrastructure
The following components are required prior to setting up the infrastructure needed by e6data. These are commonly present in most cloud environments, but if any are not present, please follow the linked guides below to create them.
Amazon Elastic Kubernetes Service (EKS) cluster
To provide secure connectivity between e6data clusters and data buckets within your AWS account.
To scale the infrastructure for e6data clusters.
Set up AWS Load Balancer Controller
For connectivity between e6data Console & e6data clusters
To allow connectivity between 3rd party tools & e6data clusters.
Create a VPC, Subnets & other VPC Resources
Optional, only required if a VPC is not already present to create an EKS Cluster or to install e6data in a new VPC.
Open the Amazon VPC console at https://console.aws.amazon.com/vpc/
On the VPC dashboard, choose Create VPC.
For Resources to create, choose VPC and more.
Keep Name tag auto-generation selected to create Name tags for the VPC resources, or clear it to provide your own Name tags for the VPC resources.
For IPv4 CIDR block, enter an IPv4 address range for the VPC. A VPC must have an IPv4 address range.
(Optional) To support IPv6 traffic, choose IPv6 CIDR block, Amazon-provided IPv6 CIDR block.
Choose a Tenancy option. This option defines if EC2 instances that you launch into the VPC will run on hardware that's shared with other AWS accounts or on hardware that's dedicated for your use only. If you choose the tenancy of the VPC to be
Default
, EC2 instances launched into this VPC will use the tenancy attribute specified when you launch the instance -- For more information, see Launch an instance using defined parameters in the Amazon EC2 User Guide for Linux Instances. If you choose the tenancy of the VPC to beDedicated
, the instances will always run as Dedicated Instances on hardware that's dedicated for your use. If you're using AWS Outposts, your Outpost requires private connectivity; you must useDefault
tenancy.For Number of Availability Zones (AZs), we recommend that you provision subnets in at least two Availability Zones for a production environment. To choose the AZs for your subnets, expand Customize AZs. Otherwise, let AWS choose them for you.
To configure your subnets, choose values for Number of public subnets and Number of private subnets. To choose the IP address ranges for your subnets, expand Customize subnets CIDR blocks. Otherwise, let AWS choose them for you.
A NAT Gateway is required to export logs/metrics from the private subnet in which the e6data resources are deployed. Choose the number of AZs in which to create NAT gateways. In production, we recommend that you deploy a NAT gateway in each AZ with resources that need access to the public internet. Note that there is a cost associated with NAT gateways. For more information, see Pricing.
(Optional) If you need to access Amazon S3 directly from your VPC, choose VPC endpoints, S3 Gateway. This creates a gateway VPC endpoint for Amazon S3. For more information, see Gateway VPC endpoints in the AWS PrivateLink Guide.
(Optional) For DNS options, both options for domain name resolution are enabled by default. If the default doesn't meet your needs, you can disable these options.
(Optional) To add a tag to your VPC, expand Additional tags, choose Add new tag, and enter a tag key and a tag value.
When using a separate VPC for e6data, adding a tag, e.g.:
app=e6data
is recommended to help monitor usage & costs.
In the Preview pane, you can visualize the relationships between the VPC resources that you've configured. Solid lines represent relationships between resources. Dotted lines represent network traffic to NAT gateways, internet gateways, and gateway endpoints. After you create the VPC, you can visualize the resources in your VPC in this format at any time using the Resource map tab. For more information, see Visualize the resources in your VPC.
When you are finished configuring your VPC, choose Create VPC.
Please make note of the VPC Region, it will be required when creating the Workspace in the e6data Console.
Create EKS Cluster & Default Node Group
Optional, only required if an EKS Cluster is not present or to install e6data in a new EKS Cluster.
Create EKS Cluster
To get started with setting up an Amazon Elastic Kubernetes Service (EKS) cluster, please follow the comprehensive documentation provided by AWS: Creating an Amazon EKS cluster.
Please make note of the EKS Cluster Name, it will be required when creating the Workspace in the e6data Console.
Enable OpenID Connect (OIDC)
Enabling IAM roles for service accounts and creating an OIDC (OpenID Connect) provider for your EKS cluster is crucial in this context because it directly relates to providing secure access for e6data clusters to interact with data buckets within your AWS account.
e6data uses OIDC for more secure access as it provides least privilege, credential isolation & auditability.
To enable IAM roles for service accounts and create an OIDC (OpenID Connect) provider for your EKS cluster, please refer to the documentation Creating an IAM OIDC provider for your cluster - Amazon EKS.
Set up Karpenter
To set up the Karpenter for your Amazon Elastic Kubernetes Service (EKS) cluster, refer to the official Karpenter documentation available on Kapenter documentation.
Use a Restricted IAM Policy: Instead of the broad permissions outlined in the Karpenter documentation for the Karpenter controller OIDC policy, you can use a more restricted IAM policy. Here is an example policy that limits permissions to only e6data-managed resources.
Karpenter has two main components:
EC2 NodeClass
Karpenter NodeClasses serve as customized blueprints for your AWS worker nodes, tailored to your cloud provider's specifications, such as EC2NodeClasses for AWS. These classes define crucial details including the AMI family (OS), security groups, subnets, and IAM roles.
NodePool
A single Karpenter NodePool handles various pods, eliminating the need for multiple node groups. Use the command below to create a NodePool with securityGroupSelectorTerms and subnetSelectorTerms for resource discovery. The consolidation policy set to WhenEmpty reduces costs by removing empty nodes.
Set up AWS Load Balancer Controller (ALB)
An AWS Load Balancer (ALB) is required in the EKS Cluster for connectivity between the e6data Console and e6data Cluster, as well as for providing connectivity between querying/BI tools and the e6data Query Engine.
To install the AWS Load Balancer Controller (ALB) in your Amazon Elastic Kubernetes Service (EKS) cluster, follow the steps outlined in the official AWS documentation at Installing the AWS Load Balancer Controller add-on - Amazon EKS.
Please take note of the following:
Ensure that the ALB load balancer controller is configured with version v2.5 or higher.
Configure the ALB controller with the parameters listed below:
Use a Restricted IAM Policy: Instead of using the broad permissions mentioned in the ALB Controller documentation, you can use a more restricted IAM policy. Here is an example policy that you can use to limit permissions to only e6data-managed resources.
Last updated