Setup using Terraform in GCP
Deploying an e6data Workspace in GCP using Terraform
Last updated
Deploying an e6data Workspace in GCP using Terraform
Last updated
Once logged into the e6data platform, it’s time to configure an e6data workspace in GCP. We keep it simple - all you need is an existing GCP account with the prerequisites listed below:
A Google Cloud Platform (GCP) account with sufficient .
A local development environment with Terraform installed.
A Google Kubernetes Engine (GKE) cluster.
. This allows connectivity between the e6data Control Plane & the e6data workspace.
Login to the e6data Console
Navigate to Workspaces
in the left-side navigation bar or click Create Workspace
Select GCP as the Cloud Provider.
Proceed to the to deploy the Workspace.
Using a Terraform script, the e6data Workspace will be deployed inside a GKE Cluster. The subsequent sections will provide instructions to create the two Terraform files required for the deployment:
The google
provider blocks are used to configure the credentials you use to authenticate with GCP, as well as the default project and location of your resources.
Extract the scripts downloaded in the previous step and navigate to the ./scripts/gcp/terraform
folder.
To specify a Google Cloud Storage (GCS) bucket for storing the Terraform state when using Google Cloud Platform (GCP) as the provider, you can add the following configuration to the Terraform script:
When using the Google Cloud provider and configuring the backend to use a GCS bucket for storing the Terraform state. Replace the <bucket_name_to_store_the_tfstate_file>
with the desired GCS bucket name that you want to use.
Additionally, the prefix
parameter allows you to specify a directory or path within the bucket where the Terraform state file will be stored. Adjust the prefix
value according to your requirements.
terraform.tfvars
FileThe terraform.tfvars
file contains the following variables that need to be configured before executing the Terraform script.
Edit the following variables in theterraform.tfvars
file according to your needs:
Please update the values of these variables in the terraform.tfvars
file to match the specific configuration details for your environment:
workspace_name
The name of the e6data workspace to be created.
gcp_project_id
The Google Cloud Platform (GCP) project ID to deploy the e6data workspace.
gcp_region
The GCP region to run the e6data workspace.
helm_chart_version
e6data workspace Helm chart version to be used.
gke_subnet_ip_cidr_range
The subnet IP range of GKE
cluster_name
The name of the Kubernetes cluster for e6data.
gke_e6data_master_ipv4_cidr_block
The IP range in CIDR notation to use for the hosted master network.
gke_version
The version of GKE to use
gke_encryption_state
The encryption state for GKE (recommended)
gke_dns_cache_enabled
The status of the NodeLocal DNSCache addon.
spot_enabled
A boolean that represents whether the underlying node VMs are spot.
kubernetes_cluster_zone
The Kubernetes cluster zone (only required for zonal clusters).
max_instances_in_nodepool
The maximum number of instances in a nodepool.
default_nodepool_instance_type
The default instance type for the node pool.
gke_e6data_initial_node_count
The initial number of nodes in the GKE cluster
gke_e6data_max_pods_per_node
The maximum number of pods per node in the GKE cluster
gke_e6data_instance_type
The instance type for the GKE nodes
kubernetes_namespace
The Kubernetes namespace to deploy the e6data workspace.
cost_labels
Cost labels for tracking costs
buckets
List of bucket names that the e6data engine queries and therefore, require read access to. Default is ["*"] which means all buckets, it is advisable to change this.
Once you have configured the necessary variables in the provider.tf
& terraform.tfvars
files, you can proceed with the deployment of the e6data workspace. Follow the steps below to initiate the deployment:
Navigate to the directory containing the Terraform files. It is essential to be in the correct directory for the Terraform commands to execute successfully.
Initialize Terraform: terraform init
Generate a Terraform plan and save it to a file (e.g. e6.plan
): terraform plan -var-file="terraform.tfvars" --out="e6.plan".
The -var-file
flag specifies the input variable file (terraform.tfvars
) that contains the necessary configuration values for the deployment.
Review the generated plan.
Apply the changes using the generated plan file: terraform apply "e6.plan"
This command applies the changes specified in the plan file (e6.plan
) to deploy the e6data workspace in your environment.
Make note of the values returned by the script.
Return to the e6data Console and enter the values returned in the previous step.
This section provides a comprehensive overview of the resources deployed using the Terraform script for the e6data workspace deployment.
Only the e6data engine, residing within the customer account has permission access to data stores.
The cross-account role does not have access to data stores, therefore access to data stores from the e6data platform is not possible.
The e6data Engine which is deployed inside the customer boundary, requires the following permissions:
Read-only access to buckets containing the data to be queried:
Read-write access to a bucket created by e6data to store query results, logs, etc.:
The workloadIdentityUser role requires the following permissions for authentication and interaction with the cluster:
The e6dataclusterViewer role requires the following permissions to monitor e6data cluster health:
The globalAddresses role requires the following permissions to manage glabladdress :
The securityPolicies role requires the following permissions to manage glabladdress :
The targetPools role requires the following permissions to get the instance list for targetpool or backendconfig.
This Terraform configuration sets up a network, subnetwork, router, and NAT gateway on Google Cloud Platform (GCP). The network is created without auto-creating subnetworks. The subnetwork is configured with a specified IP CIDR range, region, and enables private Google access. VPC flow logs can be configured for the subnetwork. A router is created and attached to the network. A Cloud NAT gateway is provisioned for internet access for private nodes, with automatic IP allocation and NAT configuration for all subnetwork IP ranges.
This Terraform configuration sets up a private and regional Google Kubernetes Engine (GKE) cluster. The cluster is configured with a specified name, region, minimum master version, monitoring and logging services, network, subnetwork, and initial node count. Vertical Pod Autoscaling is enabled, and workload identity is configured.
The cluster's private configuration includes private nodes, a master IPv4 CIDR block, and a disabled private endpoint. IP allocation policy, HTTP load balancing, and DNS cache configuration are also defined. Resource labels and master authorized networks are configured, and a lifecycle block specifies to creation of the cluster before destroying any existing one to ensure less downtime.
GKE Node Pool for Workspace: Provides a dedicated node pool in GKE to host the e6data workspace, with autoscaling and location policies for scalability and performance.
GCS Bucket for Query Results: Establishes a dedicated Google Cloud Storage (GCS) bucket within the e6data workspace to store query results.
Service Account for Workspace: The deployment includes creating a dedicated service account that ensures secure access to e6data workspace resources. This service account is assigned a custom role that grants read access to GCS buckets, enabling the e6data engine to retrieve data for querying and processing operations. Additionally, the service account is also provided with read/write access to the e6data workspace bucket. This access allows the e6data platform to write query results to the bucket, providing efficient storage and management of workspace-related data.
IAM Policy Bindings for Workspace Workload Identity: This creates an IAM policy binding, enabling the workspace service account to act as a workload identity user in the Kubernetes cluster. This binding grants the necessary permissions for authentication and interaction within the cluster, facilitating seamless integration between the e6data workspace and Kubernetes infrastructure.
IAM Policy Bindings for Platform Workload Identity:
This creates an IAM policy binding between the platform service account and the Kubernetes cluster, granting the platform service account the "roles/container.clusterViewer" role. This role provides the necessary permissions to view and access the Kubernetes cluster.
Helm Release: The Helm release in the provided Terraform code provisions and assigns cluster roles to the e6data control plane user. These cluster roles grant specific permissions and access within the Kubernetes cluster to the e6data control plane user. The defined permissions include the ability to manage various resources such as pods, nodes, services, ingresses, configmaps, secrets, jobs, deployments, daemonsets, statefulsets, and replicasets. By deploying these cluster roles through the Helm release, the e6data control plane user is equipped with the necessary permissions to effectively manage and interact with the resources within the cluster, enabling seamless operation and configuration of the e6data platform.
If a GKE cluster is not available, please to create one.
If Terraform is not installed, please .
Please download/clone the e6x-labs/terraform
repo from .
Edit the provider.tf
file according to your requirements. Please refer to the to find instructions to use the authentication method most appropriate to your environment.
For more information and to explore additional backend options, you can refer to the documentation.
To create a new Google Kubernetes Engine (GKE) cluster, you'll need to have the Google Cloud SDK installed and configured on your local machine. If you don't have it installed, you can follow the instructions at to set it up.
For detailed instructions and more advanced configurations, you can refer to the official Google Cloud documentation on .
If you don't have kubectl
installed, follow the official Kubernetes documentation to install kubectl
on your local machine:
Visit the official Terraform website at
Navigate to the "Downloads" page or click to directly access the downloads page.