Setup using Terraform in GCP
Deploying an e6data Workspace in GCP using Terraform
Please ensure to sign up for an e6data account before creating workspaces.
Once logged into the e6data platform, it’s time to configure an e6data workspace in GCP. We keep it simple - all you need is an existing GCP account with the prerequisites listed below:
Prerequisites
A Google Cloud Platform (GCP) account with sufficient permissions to create and manage resources.
A local development environment with Terraform installed. The installation steps are outlined below.
A Google Kubernetes Engine (GKE) cluster.
Add the IP address of the e6data Control Plane to the list of authorized networks. This allows connectivity between the e6data Control Plane & the e6data workspace.
Create the e6data Workspace
Login to the e6data Console
Navigate to
Workspaces
in the left-side navigation bar or clickCreate Workspace
Select GCP as the Cloud Provider.
Proceed to the next section to deploy the Workspace.
Installing the e6data Workspace
Using a Terraform script, the e6data Workspace will be deployed inside a GKE Cluster. The subsequent sections will provide instructions to create the two Terraform files required for the deployment:
If a GKE cluster is not available, please follow these instructions to create one.
If Terraform is not installed, please follow these instructions.
Download e6data Terraform Scripts
Please download/clone the e6x-labs/terraform
repo from Github.
Configure provider.tf
The google
provider blocks are used to configure the credentials you use to authenticate with GCP, as well as the default project and location of your resources.
Extract the scripts downloaded in the previous step and navigate to the ./scripts/gcp/terraform
folder.
Edit the provider.tf
file according to your requirements. Please refer to the official Terraform documentation to find instructions to use the authentication method most appropriate to your environment.
Specifying Google Cloud Storage (GCS) Bucket for Terraform State file
To specify a Google Cloud Storage (GCS) bucket for storing the Terraform state when using Google Cloud Platform (GCP) as the provider, you can add the following configuration to the Terraform script:
When using the Google Cloud provider and configuring the backend to use a GCS bucket for storing the Terraform state. Replace the <bucket_name_to_store_the_tfstate_file>
with the desired GCS bucket name that you want to use.
Additionally, the prefix
parameter allows you to specify a directory or path within the bucket where the Terraform state file will be stored. Adjust the prefix
value according to your requirements.
Make sure that the credentials used by Terraform have the necessary permissions to read from and write to the specified GCS bucket.
For more information and to explore additional backend options, you can refer to the Terraform Backend Configuration documentation.
Configuration Variables in terraform.tfvars
File
terraform.tfvars
FileThe terraform.tfvars
file contains the following variables that need to be configured before executing the Terraform script.
Edit the following variables in theterraform.tfvars
file according to your needs:
Please update the values of these variables in the terraform.tfvars
file to match the specific configuration details for your environment:
workspace_name
The name of the e6data workspace to be created.
gcp_project_id
The Google Cloud Platform (GCP) project ID to deploy the e6data workspace.
gcp_region
The GCP region to run the e6data workspace.
helm_chart_version
e6data workspace Helm chart version to be used.
gke_subnet_ip_cidr_range
The subnet IP range of GKE
cluster_name
The name of the Kubernetes cluster for e6data.
gke_e6data_master_ipv4_cidr_block
The IP range in CIDR notation to use for the hosted master network.
gke_version
The version of GKE to use
gke_encryption_state
The encryption state for GKE (recommended)
gke_dns_cache_enabled
The status of the NodeLocal DNSCache addon.
spot_enabled
A boolean that represents whether the underlying node VMs are spot.
kubernetes_cluster_zone
The Kubernetes cluster zone (only required for zonal clusters).
max_instances_in_nodepool
The maximum number of instances in a nodepool.
default_nodepool_instance_type
The default instance type for the node pool.
gke_e6data_initial_node_count
The initial number of nodes in the GKE cluster
gke_e6data_max_pods_per_node
The maximum number of pods per node in the GKE cluster
gke_e6data_instance_type
The instance type for the GKE nodes
kubernetes_namespace
The Kubernetes namespace to deploy the e6data workspace.
cost_labels
Cost labels for tracking costs
buckets
List of bucket names that the e6data engine queries and therefore, require read access to. Default is ["*"] which means all buckets, it is advisable to change this.
Execution Commands
Once you have configured the necessary variables in the provider.tf
& terraform.tfvars
files, you can proceed with the deployment of the e6data workspace. Follow the steps below to initiate the deployment:
Navigate to the directory containing the Terraform files. It is essential to be in the correct directory for the Terraform commands to execute successfully.
Initialize Terraform:
terraform init
Generate a Terraform plan and save it to a file (e.g.
e6.plan
):terraform plan -var-file="terraform.tfvars" --out="e6.plan".
The
-var-file
flag specifies the input variable file (terraform.tfvars
) that contains the necessary configuration values for the deployment.
Review the generated plan.
Apply the changes using the generated plan file:
terraform apply "e6.plan"
This command applies the changes specified in the plan file (e6.plan
) to deploy the e6data workspace in your environment.
Make note of the values returned by the script.
Return to the e6data Console and enter the values returned in the previous step.
Deployment Overview and Resource Provisioning
This section provides a comprehensive overview of the resources deployed using the Terraform script for the e6data workspace deployment.
Only the e6data engine, residing within the customer account has permission access to data stores.
The cross-account role does not have access to data stores, therefore access to data stores from the e6data platform is not possible.
Permissions
Engine Permissions
The e6data Engine which is deployed inside the customer boundary, requires the following permissions:
Read-only access to buckets containing the data to be queried:
Read-write access to a bucket created by e6data to store query results, logs, etc.:
Permissions for connectivity with the e6data control plane
The workloadIdentityUser role requires the following permissions for authentication and interaction with the cluster:
The e6dataclusterViewer role requires the following permissions to monitor e6data cluster health:
The globalAddresses role requires the following permissions to manage glabladdress :
The securityPolicies role requires the following permissions to manage glabladdress :
The targetPools role requires the following permissions to get the instance list for targetpool or backendconfig.
Resources Created
This Terraform configuration sets up a network, subnetwork, router, and NAT gateway on Google Cloud Platform (GCP). The network is created without auto-creating subnetworks. The subnetwork is configured with a specified IP CIDR range, region, and enables private Google access. VPC flow logs can be configured for the subnetwork. A router is created and attached to the network. A Cloud NAT gateway is provisioned for internet access for private nodes, with automatic IP allocation and NAT configuration for all subnetwork IP ranges.
This Terraform configuration sets up a private and regional Google Kubernetes Engine (GKE) cluster. The cluster is configured with a specified name, region, minimum master version, monitoring and logging services, network, subnetwork, and initial node count. Vertical Pod Autoscaling is enabled, and workload identity is configured.
The cluster's private configuration includes private nodes, a master IPv4 CIDR block, and a disabled private endpoint. IP allocation policy, HTTP load balancing, and DNS cache configuration are also defined. Resource labels and master authorized networks are configured, and a lifecycle block specifies to creation of the cluster before destroying any existing one to ensure less downtime.
GKE Node Pool for Workspace: Provides a dedicated node pool in GKE to host the e6data workspace, with autoscaling and location policies for scalability and performance.
GCS Bucket for Query Results: Establishes a dedicated Google Cloud Storage (GCS) bucket within the e6data workspace to store query results.
Service Account for Workspace: The deployment includes creating a dedicated service account that ensures secure access to e6data workspace resources. This service account is assigned a custom role that grants read access to GCS buckets, enabling the e6data engine to retrieve data for querying and processing operations. Additionally, the service account is also provided with read/write access to the e6data workspace bucket. This access allows the e6data platform to write query results to the bucket, providing efficient storage and management of workspace-related data.
IAM Policy Bindings for Workspace Workload Identity: This creates an IAM policy binding, enabling the workspace service account to act as a workload identity user in the Kubernetes cluster. This binding grants the necessary permissions for authentication and interaction within the cluster, facilitating seamless integration between the e6data workspace and Kubernetes infrastructure.
IAM Policy Bindings for Platform Workload Identity:
This creates an IAM policy binding between the platform service account and the Kubernetes cluster, granting the platform service account the "roles/container.clusterViewer" role. This role provides the necessary permissions to view and access the Kubernetes cluster.
Helm Release: The Helm release in the provided Terraform code provisions and assigns cluster roles to the e6data control plane user. These cluster roles grant specific permissions and access within the Kubernetes cluster to the e6data control plane user. The defined permissions include the ability to manage various resources such as pods, nodes, services, ingresses, configmaps, secrets, jobs, deployments, daemonsets, statefulsets, and replicasets. By deploying these cluster roles through the Helm release, the e6data control plane user is equipped with the necessary permissions to effectively manage and interact with the resources within the cluster, enabling seamless operation and configuration of the e6data platform.
Last updated