LogoLogo
  • Welcome to e6data
  • Introduction to e6data
    • Concepts
    • Architecture
      • e6data in VPC Deployment Model
      • Connect to e6data serverless compute
  • Get Started
  • Sign Up
  • Setup
    • AWS Setup
      • In VPC Deployment (AWS)
        • Prerequisite Infrastructure
        • Infrastructure & Permissions for e6data
        • Setup Kubernetes Components
        • Setup using Terraform in AWS
          • Update a AWS Terraform for your Workspace
        • AWS PrivateLink and e6data
        • VPC Peering | e6data on AWS
      • Connect to e6data serverless compute (AWS)
        • Workspace Creation
        • Catalog Creation
          • Glue Metastore
          • Hive Metastore
          • Unity Catalog
        • Cluster Creation
    • GCP Setup
      • In VPC Deployment (GCP)
        • Prerequisite Infrastructure
        • Infrastructure & Permissions for e6data
        • Setup Kubernetes Components
        • Setup using Terraform in GCP
        • Update a GCP Terraform for your Workspace
      • Connect to e6data serverless compute (GCP)
    • Azure Setup
      • Prerequisite Infrastructure
      • Infrastructure & Permissions for e6data
      • Setup Kubernetes Components
      • Setup using Terraform in AZURE
        • Update a AZURE Terraform for your Workspace
  • Workspaces
    • Create Workspaces
    • Enable/Disable Workspaces
    • Update a Workspace
    • Delete a Workspace
  • Catalogs
    • Create Catalogs
      • Hive Metastore
        • Connect to a Hive Metastore
        • Edit a Hive Metastore Connection
        • Delete a Hive Metastore Connection
      • Glue Metastore
        • Connect to a Glue Metastore
        • Edit a Glue Metastore Connection
        • Delete a Glue Metastore Connection
      • Unity Catalog
        • Connect to Unity Catalog
        • Edit Unity Catalog
        • Delete Unity Catalog
      • Cross-account Catalog Access
        • Configure Cross-account Catalog to Access AWS Hive Metastore
        • Configure Cross-account Catalog to Access Unity Catalog
        • Configure Cross-account Catalog to Access AWS Glue
        • Configure Cross-account Catalog to Access GCP Hive Metastore
    • Manage Catalogs
    • Privileges
      • Access Control
      • Column Masking
      • Row Filter
  • Clusters
    • Edit & Delete Clusters
    • Suspend & Resume Clusters
    • Cluster Size
    • Load Based Sizing
    • Auto Suspension
    • Query Timeout
    • Monitoring
    • Connection Info
  • Pools
    • Delete Pools
  • Query Editor
    • Editor Pane
    • Results Pane
    • Schema Explorer
    • Data Preview
  • Notebook
    • Editor Pane
    • Results Pane
    • Schema Explorer
    • Data Preview
  • Query History
    • Query Count API
  • Connectivity
    • IP Sets
    • Endpoints
    • Cloud Resources
    • Network Firewall
  • Access Control
    • Users
    • Groups
    • Roles
      • Permissions
      • Policies
    • Single Sign-On (SSO)
      • AWS SSO
      • Okta
      • Microsoft My Apps-SSO
      • Icons for IdP
    • Service Accounts
    • Multi-Factor Authentication (Beta)
  • Usage and Cost Management
  • Audit Log
  • User Settings
    • Profile
    • Personal Access Tokens (PAT)
  • Advanced Features
    • Cross-Catalog & Cross-Schema Querying
  • Supported Data Types
  • SQL Command Reference
    • Query Syntax
      • General functions
    • Aggregate Functions
    • Mathematical Functions & Operators
      • Arithematic Operators
      • Rounding and Truncation Functions
      • Exponential and Root Functions
      • Trigonometric Functions
      • Logarithmic Functions
    • String Functions
    • Date-Time Functions
      • Constant Functions
      • Conversion Functions
      • Date Truncate Function
      • Addition and Subtraction Functions
      • Extraction Functions
      • Format Functions
      • Timezone Functions
    • Conditional Expressions
    • Conversion Functions
    • Window Functions
    • Comparison Operators & Functions
    • Logical Operators
    • Statistical Functions
    • Bitwise Functions
    • Array Functions
    • Regular Expression Functions
    • Generate Functions
    • Cardinality Estimation Functions
    • JSON Functions
    • Checksum Functions
    • Unload Function (Copy into)
    • Struct Functions
  • Equivalent Functions & Operators
  • Connectors & Drivers
    • DBeaver
    • DbVisualiser
    • Apache Superset
    • Jupyter Notebook
    • Tableau Cloud
    • Tableau Desktop
    • Power BI
    • Metabase
    • Zeppelin
    • Python Connector
      • Code Samples
    • JDBC Driver
      • Code Samples
      • API Support
    • Configure Cluster Ingress
      • ALB Ingress in Kubernetes
      • GCE Ingress in Kubernetes
      • Ingress-Nginx in Kubernetes
  • Security & Trust
    • Best Practices
      • AWS Best Practices
    • Features & Responsibilities Matrix
    • Data Protection Addendum(DPA)
  • Tutorials and Best Practices
    • How to configure HIVE metastore if you don't have one?
    • How-To Videos
  • Known Limitations
    • SQL Limitations
    • Other Limitations
    • Restart Triggers
    • Cloud Provider Limitations
  • Error Codes
    • General Errors
    • User Account Errors
    • Workspace Errors
    • Catalog Errors
    • Cluster Errors
    • Data Governance Errors
    • Query History Errors
    • Query Editor Errors
    • Pool Errors
    • Connectivity Errors
  • Terms & Condition
  • Privacy Policy
    • Cookie Policy
  • FAQs
    • Workspace Setup
    • Security
    • Catalog Privileges
  • Services Utilised for e6data Deployment
    • AWS supported regions
    • GCP supported regions
    • AZURE supported regions
  • Release Notes & Updates
    • 6th Sept 2024
    • 6th June 2024
    • 18th April 2024
    • 9th April 2024
    • 30th March 2024
    • 16th March 2024
    • 14th March 2024
    • 12th March 2024
    • 2nd March 2024
    • 10th February 2024
    • 3rd February 2024
    • 17th January 2024
    • 9th January 2024
    • 3rd January 2024
    • 18th December 2023
    • 12th December 2023
    • 9th December 2023
    • 4th December 2023
    • 27th November 2023
    • 8th September 2023
    • 4th September 2023
    • 26th August 2023
    • 21st August 2023
    • 19th July 2023
    • 23rd May 2023
    • 5th May 2023
    • 28th April 2023
    • 19th April 2023
    • 15th April 2023
    • 10th April 2023
    • 30th March 2023
Powered by GitBook
On this page
  • Prerequisites
  • Create the e6data Workspace
  • Installing the e6data Workspace
  • Download e6data Terraform Scripts
  • Configure provider.tf
  • Configuration Variables in terraform.tfvars File
  • Execution Commands
  • Deployment Overview and Resource Provisioning
  • Permissions
  • Resources Created
  1. Setup
  2. GCP Setup
  3. In VPC Deployment (GCP)

Setup using Terraform in GCP

Deploying an e6data Workspace in GCP using Terraform

PreviousSetup Kubernetes ComponentsNextUpdate a GCP Terraform for your Workspace

Last updated 10 months ago

Please ensure to before creating workspaces.

Once logged into the e6data platform, it’s time to configure an e6data workspace in GCP. We keep it simple - all you need is an existing GCP account with the prerequisites listed below:

Prerequisites

  • A Google Cloud Platform (GCP) account with sufficient .

  • A local development environment with Terraform installed.

  • A Google Kubernetes Engine (GKE) cluster.

    • . This allows connectivity between the e6data Control Plane & the e6data workspace.

Create the e6data Workspace

  1. Login to the e6data Console

  2. Navigate to Workspaces in the left-side navigation bar or click Create Workspace

  3. Select GCP as the Cloud Provider.

  4. Proceed to the to deploy the Workspace.

Installing the e6data Workspace

Using a Terraform script, the e6data Workspace will be deployed inside a GKE Cluster. The subsequent sections will provide instructions to create the two Terraform files required for the deployment:

Download e6data Terraform Scripts

Configure provider.tf

The google provider blocks are used to configure the credentials you use to authenticate with GCP, as well as the default project and location of your resources.

Extract the scripts downloaded in the previous step and navigate to the ./scripts/gcp/terraform folder.

Sample provider.tf
provider.tf (sample)
terraform {
  required_providers {
    google = {
      source = "hashicorp/google"
      version = "4.72.0"
    }
  }
}

provider "google" {
    project = var.gcp_project_id
    region = var.gcp_region
    credentials = "{{GOOGLE_CLOUD_KEYFILE_JSON}}"
    #access_token = "{{ gcp_access_token }}"

  backend "gcs" {
    bucket  = "<bucket_name_to_store_the_tfstate_file>"
    prefix  = "terraform/state"
  }
}

Specifying Google Cloud Storage (GCS) Bucket for Terraform State file

To specify a Google Cloud Storage (GCS) bucket for storing the Terraform state when using Google Cloud Platform (GCP) as the provider, you can add the following configuration to the Terraform script:

When using the Google Cloud provider and configuring the backend to use a GCS bucket for storing the Terraform state. Replace the <bucket_name_to_store_the_tfstate_file> with the desired GCS bucket name that you want to use.

Additionally, the prefix parameter allows you to specify a directory or path within the bucket where the Terraform state file will be stored. Adjust the prefix value according to your requirements.

  • Make sure that the credentials used by Terraform have the necessary permissions to read from and write to the specified GCS bucket.

Configuration Variables in terraform.tfvars File

The terraform.tfvars file contains the following variables that need to be configured before executing the Terraform script.

Edit the following variables in theterraform.tfvars file according to your needs:

Sample terraform.tfvars
terraform.tfvars (sample)
# GCP Variables
gcp_region                          = "us-central1"                          # The region where the e6data resources will be created
gcp_project_id                      = "<project_ID>"                          # The ID of the GCP project

# e6data Workspace Variables
workspace_name                      = "workspace"                             # The name of the e6data workspace
# Note: The variable workspace_name should meet the following criteria:
# a) Only lowercase alphanumeric characters are accepted.
# b) Must have a minimum of 3 characters.

helm_chart_version                  = "2.0.5"                               ### e6data workspace Helm chart version to be used.

# Network Variables
gke_subnet_ip_cidr_range            = "10.100.0.0/18"                        # The subnet IP range for GKE

gke_e6data_master_ipv4_cidr_block   = "10.103.4.0/28" 
# The IP range in CIDR notation to use for the hosted master network
# This range will be used for assigning private IP addresses to the cluster master(s) and the ILB VIP
# This range must not overlap with any other ranges in use within the cluster's network, and it must be a /28 subnet

# Kubernetes Variables
gke_version                         = "1.28.8"                               # The version of GKE to use                
gke_encryption_state                = "DECRYPTED"                            # The encryption state for GKE (It is recommended to use encryption)
gke_dns_cache_enabled               = true                                   # The status of the NodeLocal DNSCache addon.
spot_enabled                        = true                                   # A boolean that represents whether the underlying node VMs are spot.

# GKE Cluster variables
cluster_name                        = "gke-cluster-name"                      # The name of the GKE cluster
default_nodepool_instance_type      = "e2-standard-2"                        # The default instance type for the node pool

gke_e6data_initial_node_count       = 1                                      # The initial number of nodes in the GKE cluster
gke_e6data_max_pods_per_node        = 64                                      # The maximum number of pods per node in the GKE cluster
gke_e6data_instance_type            = "c2-standard-30"                       # The instance type for the GKE nodes
max_instances_in_nodepool          = 50                                      # The maximum number of instances in a node group

# Kubernetes Namespace
kubernetes_namespace                = "namespace"                            # The namespace to use for Kubernetes resources

# Cost Labels
cost_labels = {}                            # Cost labels for tracking costs
# Note: The variable cost_labels only accepts lowercase letters ([a-z]), numeric characters ([0-9]), underscores (_) and dashes (-).

buckets                             = ["*"]                                 ### List of bucket names that the e6data engine queries and therefore, require read access to. The default is ["*"] which means all buckets, it is advisable to change this.

Please update the values of these variables in the terraform.tfvars file to match the specific configuration details for your environment:

workspace_name

The name of the e6data workspace to be created.

gcp_project_id

The Google Cloud Platform (GCP) project ID to deploy the e6data workspace.

gcp_region

The GCP region to run the e6data workspace.

helm_chart_version

e6data workspace Helm chart version to be used.

gke_subnet_ip_cidr_range

The subnet IP range of GKE

cluster_name

The name of the Kubernetes cluster for e6data.

gke_e6data_master_ipv4_cidr_block

The IP range in CIDR notation to use for the hosted master network.

gke_version

The version of GKE to use

gke_encryption_state

The encryption state for GKE (recommended)

gke_dns_cache_enabled

The status of the NodeLocal DNSCache addon.

spot_enabled

A boolean that represents whether the underlying node VMs are spot.

kubernetes_cluster_zone

The Kubernetes cluster zone (only required for zonal clusters).

max_instances_in_nodepool

The maximum number of instances in a nodepool.

default_nodepool_instance_type

The default instance type for the node pool.

gke_e6data_initial_node_count

The initial number of nodes in the GKE cluster

gke_e6data_max_pods_per_node

The maximum number of pods per node in the GKE cluster

gke_e6data_instance_type

The instance type for the GKE nodes

kubernetes_namespace

The Kubernetes namespace to deploy the e6data workspace.

cost_labels

Cost labels for tracking costs

buckets

List of bucket names that the e6data engine queries and therefore, require read access to. Default is ["*"] which means all buckets, it is advisable to change this.

Execution Commands

Once you have configured the necessary variables in the provider.tf & terraform.tfvars files, you can proceed with the deployment of the e6data workspace. Follow the steps below to initiate the deployment:

  1. Navigate to the directory containing the Terraform files. It is essential to be in the correct directory for the Terraform commands to execute successfully.

  2. Initialize Terraform: terraform init

  3. Generate a Terraform plan and save it to a file (e.g. e6.plan): terraform plan -var-file="terraform.tfvars" --out="e6.plan".

    • The -var-file flag specifies the input variable file (terraform.tfvars) that contains the necessary configuration values for the deployment.

  4. Review the generated plan.

  5. Apply the changes using the generated plan file: terraform apply "e6.plan"

This command applies the changes specified in the plan file (e6.plan) to deploy the e6data workspace in your environment.

  1. Make note of the values returned by the script.

  2. Return to the e6data Console and enter the values returned in the previous step.

Deployment Overview and Resource Provisioning

This section provides a comprehensive overview of the resources deployed using the Terraform script for the e6data workspace deployment.

  • Only the e6data engine, residing within the customer account has permission access to data stores.

  • The cross-account role does not have access to data stores, therefore access to data stores from the e6data platform is not possible.

Permissions

Engine Permissions

The e6data Engine which is deployed inside the customer boundary, requires the following permissions:

Read-only access to buckets containing the data to be queried:

permissions = [
    "storage.objects.getIamPolicy",
    "storage.objects.get",
    "storage.objects.list",
]

Read-write access to a bucket created by e6data to store query results, logs, etc.:

permissions = [
    "storage.objects.setIamPolicy",
    "storage.objects.getIamPolicy",
    "storage.objects.update",
    "storage.objects.create",
    "storage.objects.delete",
    "storage.objects.get",
    "storage.objects.list",
]

Permissions for connectivity with the e6data control plane

The workloadIdentityUser role requires the following permissions for authentication and interaction with the cluster:

permissions  = [
    "iam.serviceAccounts.get",
    "iam.serviceAccounts.getAccessToken",
    "iam.serviceAccounts.getOpenIdToken",
    "iam.serviceAccounts.list"
]

The e6dataclusterViewer role requires the following permissions to monitor e6data cluster health:

permissions  = [
    "container.clusters.get",
    "container.clusters.list",
    "container.roleBindings.get",
    "container.backendConfigs.get",
    "container.backendConfigs.create",
    "container.backendConfigs.delete",
    "container.backendConfigs.update",
    "resourcemanager.projects.get",
    "compute.sslCertificates.get",
    "compute.forwardingRules.list"
]

The globalAddresses role requires the following permissions to manage glabladdress :

    "compute.globalAddresses.create",
    "compute.globalAddresses.delete",
    "compute.globalAddresses.get",
    "compute.globalAddresses.setLabels"

The securityPolicies role requires the following permissions to manage glabladdress :

"compute.securityPolicies.create",
"compute.securityPolicies.get",
"compute.securityPolicies.delete",
"compute.securityPolicies.update"

The targetPools role requires the following permissions to get the instance list for targetpool or backendconfig.

   "compute.instances.get",
    "compute.targetPools.get",
    "compute.targetPools.list"

Resources Created

  1. This Terraform configuration sets up a network, subnetwork, router, and NAT gateway on Google Cloud Platform (GCP). The network is created without auto-creating subnetworks. The subnetwork is configured with a specified IP CIDR range, region, and enables private Google access. VPC flow logs can be configured for the subnetwork. A router is created and attached to the network. A Cloud NAT gateway is provisioned for internet access for private nodes, with automatic IP allocation and NAT configuration for all subnetwork IP ranges.

  2. This Terraform configuration sets up a private and regional Google Kubernetes Engine (GKE) cluster. The cluster is configured with a specified name, region, minimum master version, monitoring and logging services, network, subnetwork, and initial node count. Vertical Pod Autoscaling is enabled, and workload identity is configured.

  3. The cluster's private configuration includes private nodes, a master IPv4 CIDR block, and a disabled private endpoint. IP allocation policy, HTTP load balancing, and DNS cache configuration are also defined. Resource labels and master authorized networks are configured, and a lifecycle block specifies to creation of the cluster before destroying any existing one to ensure less downtime.

  4. GKE Node Pool for Workspace: Provides a dedicated node pool in GKE to host the e6data workspace, with autoscaling and location policies for scalability and performance.

  5. GCS Bucket for Query Results: Establishes a dedicated Google Cloud Storage (GCS) bucket within the e6data workspace to store query results.

  6. Service Account for Workspace: The deployment includes creating a dedicated service account that ensures secure access to e6data workspace resources. This service account is assigned a custom role that grants read access to GCS buckets, enabling the e6data engine to retrieve data for querying and processing operations. Additionally, the service account is also provided with read/write access to the e6data workspace bucket. This access allows the e6data platform to write query results to the bucket, providing efficient storage and management of workspace-related data.

  7. IAM Policy Bindings for Workspace Workload Identity: This creates an IAM policy binding, enabling the workspace service account to act as a workload identity user in the Kubernetes cluster. This binding grants the necessary permissions for authentication and interaction within the cluster, facilitating seamless integration between the e6data workspace and Kubernetes infrastructure.

  8. IAM Policy Bindings for Platform Workload Identity:

    This creates an IAM policy binding between the platform service account and the Kubernetes cluster, granting the platform service account the "roles/container.clusterViewer" role. This role provides the necessary permissions to view and access the Kubernetes cluster.

  9. Helm Release: The Helm release in the provided Terraform code provisions and assigns cluster roles to the e6data control plane user. These cluster roles grant specific permissions and access within the Kubernetes cluster to the e6data control plane user. The defined permissions include the ability to manage various resources such as pods, nodes, services, ingresses, configmaps, secrets, jobs, deployments, daemonsets, statefulsets, and replicasets. By deploying these cluster roles through the Helm release, the e6data control plane user is equipped with the necessary permissions to effectively manage and interact with the resources within the cluster, enabling seamless operation and configuration of the e6data platform.

Creating a GKE Cluster (optional)

Once you have the Google Cloud SDK ready, follow these steps to create a new GKE cluster:

  1. Open a terminal or command prompt.

  2. Authenticate with your Google Cloud account by running the following command and following the instructions:

    gcloud auth login

  3. Set the default project for your cluster by running the following command and replacing [PROJECT_ID] with your Google Cloud project ID:

    gcloud config set project [PROJECT_ID]

  4. Run the following command, replacing [CLUSTER_NAME] with your desired name for the GKE cluster and [REGION] with the target region:

    gcloud container clusters create [CLUSTER_NAME] --region [REGION] --num-nodes=1 --workload-pool=[PROJECT_ID].svc.id.goog

    This command creates a GKE cluster with the specified name and configuration.

    • The --num-nodes flag specifies the desired number of default nodes in the cluster.

    • The --workload-pool flag enables Workload Identity for the cluster by specifying the workload pool to use. Replace [PROJECT_ID] with your actual project ID. The project ID is a unique identifier for your Google Cloud project.

    Please note that the cluster creation process may take some time as Google Cloud provisions the necessary resources and sets up the cluster.

  1. Connecting to the GKE Cluster and Installing kubectl

    After the cluster creation is complete, run the following command to retrieve the cluster credentials and configure kubectl to connect to the GKE cluster:

gcloud container clusters get-credentials [CLUSTER_NAME] --region=[REGION]

  • Replace [CLUSTER_NAME] with the name of your GKE cluster and [REGION] with the region where the cluster is located.

  • This command fetches the necessary cluster credentials and configures kubectl to connect to the GKE cluster.

  • Verify the connection to the GKE cluster by running the following command:

kubectl get nodes

This command lists the nodes in your GKE cluster. If the connection is successful, you should see a list of nodes displayed.

  • Once kubectl is installed, you can verify the installation by running:

kubectl version --short

This command displays the version information of kubectl, indicating that it is successfully installed.

Congratulations! You have now created a GKE cluster. You can now deploy and manage e6data workspace on this cluster using the Terraform script below.

Installing Terraform Locally (optional)

To install Terraform on your local machine, you can follow the steps, adapted from the official HashiCorp Terraform documentation:

  1. Download the appropriate package for your operating system (Windows, macOS, Linux).

  2. Extract the downloaded package to a directory of your choice.

  3. Add the Terraform executable to your system's PATH environment variable.

    • Windows:

      1. Open the Start menu and search for "Environment Variables."

      2. Select "Edit the system environment variables."

      3. Click the "Environment Variables" button.

      4. Under "System variables," find the "Path" variable and click "Edit."

      5. Add the path to the directory where you extracted the Terraform executable (e.g., C:\\terraform) to the list of paths.

      6. Click "OK" to save the changes.

    • For macOS and Linux:

      1. Open a terminal.

      2. Run the following command, replacing <path_to_extracted_binary> with the path to the directory where you extracted the Terraform executable: export PATH=$PATH:<path_to_extracted_binary>

      3. Optionally, you can add this command to your shell's profile file (e.g., ~/.bash_profile, ~/.bashrc, ~/.zshrc) to make it persistent across terminal sessions.

  4. Verify the installation by opening a new terminal window and running the following command: terraform version. If Terraform is installed correctly, you should see the version number displayed.

If a GKE cluster is not available, please to create one.

If Terraform is not installed, please .

Please download/clone the e6x-labs/terraform repo from .

Edit the provider.tf file according to your requirements. Please refer to the to find instructions to use the authentication method most appropriate to your environment.

For more information and to explore additional backend options, you can refer to the documentation.

To create a new Google Kubernetes Engine (GKE) cluster, you'll need to have the Google Cloud SDK installed and configured on your local machine. If you don't have it installed, you can follow the instructions at to set it up.

For detailed instructions and more advanced configurations, you can refer to the official Google Cloud documentation on .

If you don't have kubectl installed, follow the official Kubernetes documentation to install kubectl on your local machine:

Visit the official Terraform website at

Navigate to the "Downloads" page or click to directly access the downloads page.

Github
official Terraform documentation
Terraform Backend Configuration
install the gcloud CLI | Google Cloud
cloud container clusters create | Google Cloud CLI Documentation
Install kubectl
Terraform by HashiCorp
here
sign up for an e6data account
Add the IP address of the e6data Control Plane to the list of authorized networks
permissions to create and manage resources
The installation steps are outlined below.
next section
provider.tf
terraform.tfvars
follow these instructions
follow these instructions
GitHub - e6x-labs/terraformGitHub
Download e6data Workspace Deployment Terraform scripts from Github
Logo