LogoLogo
  • Welcome to e6data
  • Introduction to e6data
    • Concepts
    • Architecture
      • e6data in VPC Deployment Model
      • Connect to e6data serverless compute
  • Get Started
  • Sign Up
  • Setup
    • AWS Setup
      • In VPC Deployment (AWS)
        • Prerequisite Infrastructure
        • Infrastructure & Permissions for e6data
        • Setup Kubernetes Components
        • Setup using Terraform in AWS
          • Update a AWS Terraform for your Workspace
        • AWS PrivateLink and e6data
        • VPC Peering | e6data on AWS
      • Connect to e6data serverless compute (AWS)
        • Workspace Creation
        • Catalog Creation
          • Glue Metastore
          • Hive Metastore
          • Unity Catalog
        • Cluster Creation
    • GCP Setup
      • In VPC Deployment (GCP)
        • Prerequisite Infrastructure
        • Infrastructure & Permissions for e6data
        • Setup Kubernetes Components
        • Setup using Terraform in GCP
        • Update a GCP Terraform for your Workspace
      • Connect to e6data serverless compute (GCP)
    • Azure Setup
      • Prerequisite Infrastructure
      • Infrastructure & Permissions for e6data
      • Setup Kubernetes Components
      • Setup using Terraform in AZURE
        • Update a AZURE Terraform for your Workspace
  • Workspaces
    • Create Workspaces
    • Enable/Disable Workspaces
    • Update a Workspace
    • Delete a Workspace
  • Catalogs
    • Create Catalogs
      • Hive Metastore
        • Connect to a Hive Metastore
        • Edit a Hive Metastore Connection
        • Delete a Hive Metastore Connection
      • Glue Metastore
        • Connect to a Glue Metastore
        • Edit a Glue Metastore Connection
        • Delete a Glue Metastore Connection
      • Unity Catalog
        • Connect to Unity Catalog
        • Edit Unity Catalog
        • Delete Unity Catalog
      • Cross-account Catalog Access
        • Configure Cross-account Catalog to Access AWS Hive Metastore
        • Configure Cross-account Catalog to Access Unity Catalog
        • Configure Cross-account Catalog to Access AWS Glue
        • Configure Cross-account Catalog to Access GCP Hive Metastore
    • Manage Catalogs
    • Privileges
      • Access Control
      • Column Masking
      • Row Filter
  • Clusters
    • Edit & Delete Clusters
    • Suspend & Resume Clusters
    • Cluster Size
    • Load Based Sizing
    • Auto Suspension
    • Query Timeout
    • Monitoring
    • Connection Info
  • Pools
    • Delete Pools
  • Query Editor
    • Editor Pane
    • Results Pane
    • Schema Explorer
    • Data Preview
  • Notebook
    • Editor Pane
    • Results Pane
    • Schema Explorer
    • Data Preview
  • Query History
    • Query Count API
  • Connectivity
    • IP Sets
    • Endpoints
    • Cloud Resources
    • Network Firewall
  • Access Control
    • Users
    • Groups
    • Roles
      • Permissions
      • Policies
    • Single Sign-On (SSO)
      • AWS SSO
      • Okta
      • Microsoft My Apps-SSO
      • Icons for IdP
    • Service Accounts
    • Multi-Factor Authentication (Beta)
  • Usage and Cost Management
  • Audit Log
  • User Settings
    • Profile
    • Personal Access Tokens (PAT)
  • Advanced Features
    • Cross-Catalog & Cross-Schema Querying
  • Supported Data Types
  • SQL Command Reference
    • Query Syntax
      • General functions
    • Aggregate Functions
    • Mathematical Functions & Operators
      • Arithematic Operators
      • Rounding and Truncation Functions
      • Exponential and Root Functions
      • Trigonometric Functions
      • Logarithmic Functions
    • String Functions
    • Date-Time Functions
      • Constant Functions
      • Conversion Functions
      • Date Truncate Function
      • Addition and Subtraction Functions
      • Extraction Functions
      • Format Functions
      • Timezone Functions
    • Conditional Expressions
    • Conversion Functions
    • Window Functions
    • Comparison Operators & Functions
    • Logical Operators
    • Statistical Functions
    • Bitwise Functions
    • Array Functions
    • Regular Expression Functions
    • Generate Functions
    • Cardinality Estimation Functions
    • JSON Functions
    • Checksum Functions
    • Unload Function (Copy into)
    • Struct Functions
  • Equivalent Functions & Operators
  • Connectors & Drivers
    • DBeaver
    • DbVisualiser
    • Apache Superset
    • Jupyter Notebook
    • Tableau Cloud
    • Tableau Desktop
    • Power BI
    • Metabase
    • Zeppelin
    • Python Connector
      • Code Samples
    • JDBC Driver
      • Code Samples
      • API Support
    • Configure Cluster Ingress
      • ALB Ingress in Kubernetes
      • GCE Ingress in Kubernetes
      • Ingress-Nginx in Kubernetes
  • Security & Trust
    • Best Practices
      • AWS Best Practices
    • Features & Responsibilities Matrix
    • Data Protection Addendum(DPA)
  • Tutorials and Best Practices
    • How to configure HIVE metastore if you don't have one?
    • How-To Videos
  • Known Limitations
    • SQL Limitations
    • Other Limitations
    • Restart Triggers
    • Cloud Provider Limitations
  • Error Codes
    • General Errors
    • User Account Errors
    • Workspace Errors
    • Catalog Errors
    • Cluster Errors
    • Data Governance Errors
    • Query History Errors
    • Query Editor Errors
    • Pool Errors
    • Connectivity Errors
  • Terms & Condition
  • Privacy Policy
    • Cookie Policy
  • FAQs
    • Workspace Setup
    • Security
    • Catalog Privileges
  • Services Utilised for e6data Deployment
    • AWS supported regions
    • GCP supported regions
    • AZURE supported regions
  • Release Notes & Updates
    • 6th Sept 2024
    • 6th June 2024
    • 18th April 2024
    • 9th April 2024
    • 30th March 2024
    • 16th March 2024
    • 14th March 2024
    • 12th March 2024
    • 2nd March 2024
    • 10th February 2024
    • 3rd February 2024
    • 17th January 2024
    • 9th January 2024
    • 3rd January 2024
    • 18th December 2023
    • 12th December 2023
    • 9th December 2023
    • 4th December 2023
    • 27th November 2023
    • 8th September 2023
    • 4th September 2023
    • 26th August 2023
    • 21st August 2023
    • 19th July 2023
    • 23rd May 2023
    • 5th May 2023
    • 28th April 2023
    • 19th April 2023
    • 15th April 2023
    • 10th April 2023
    • 30th March 2023
Powered by GitBook
On this page
  • Required Infrastructure
  • Create a GCS Bucket for e6data
  • Create a Service account for the e6data Query Engine
  • Create IAM policy binding for the workspace service account and Kubernetes cluster
  • Create IAM policy binding for Platform Service and Kubernetes cluster
  • Create a GKE Nodepool
  1. Setup
  2. GCP Setup
  3. In VPC Deployment (GCP)

Infrastructure & Permissions for e6data

PreviousPrerequisite InfrastructureNextSetup Kubernetes Components

Last updated 11 months ago

The specific infrastructure and permissions required by e6data and instructions to create them are provided below:

Required Infrastructure

The following infrastructure required to run e6data must be created before setup:

  1. GKE Nodepool

    • Create a GKE node pool in an existing GKE cluster or a newly created GKE cluster for e6data.

  2. GCS Bucket

    • To store e6data operational logs, cache & usage data.

Create a GCS Bucket for e6data

A GCS bucket is required to store data required for the operation of the e6data workspace, eg: service logs, query results, state information, etc.

When creating a GCS bucket it is advisable to follow the GCP documentation.

Please make note of the GCS Bucket Name, it will be required when creating the Workspace in the e6data Console.

Create a Service account for the e6data Query Engine

The e6data Query Engine requires access to the GCS buckets containing the target data for querying. To provision the required access we need to create a custom role and associate it with a service account in GCP.

This configuration allows us to establish a secure connection between the Kubernetes environment and GCP. Once this IAM Role is associated with the service account, any Pods within the e6data clusters that are configured to use this service account will inherit the permissions defined in the IAM Role.

Create a Custom Role for write access to the e6data bucket:

Create a custom role that grants write access to the "e6data" bucket for the "workspace" service account, which will be created in the next step:

  • Go to the Google Cloud Console and navigate to IAM & Admin > Roles.

  • Click on "Create role."

  • Enter a title and description for the role (e.g., "e6data Custom Role").

  • In the "Permissions" section, add the following permissions (replace [BUCKET_NAME] with the name of the e6data workspace bucket which we created earlier):

  permissions = [
    "storage.objects.getIamPolicy",
    "storage.objects.update",
    "storage.objects.create",
    "storage.objects.delete",
    "storage.objects.get",
    "storage.objects.list",
  ]
  
condition {
    title       = "Workspace Write Access"
    description = "Write access to e6data workspace GCS bucket"
    expression  = "resource.name.startsWith(\"projects/_/buckets/[BUCKET_NAME]/\")"
  }

Create a Service Account:

  • Go to the Google Cloud Console and navigate to IAM & Admin > Service accounts.

  • Click on "Create service account."

  • Enter a name and description for the service account (e.g., "e6data-service-account").

  • Click "Create" and then select the two custom roles for the service account that we created in the previous steps.

  • Click "Continue" and then "Done" to create the service account.

Create a Custom Role for read access to the data buckets to query:

Create a custom role that grants read access to the data buckets for the "workspace" service account to query.

  • Go to the Google Cloud Console and navigate to IAM & Admin > Roles.

  • Click on "Create role."

  • Enter a title and description for the role (e.g., "e6data Custom Role").

  • In the "Permissions" section, add the following permissions (replace [BUCKET_NAME] with the name of the e6data workspace bucket which we created earlier):

permissions = [
    "storage.objects.getIamPolicy",
    "storage.objects.get",
    "storage.objects.list",
  ]

Attach the read permission to the buckets which need to be queried:

  1. Navigate to the Cloud Storage section: Click on the menu icon in the top left corner of the console, then navigate to the "Storage" section and click on "Storage" to open the Cloud Storage browser.

  2. Select a bucket: Click on the bucket to which you want to assign the IAM role.

  3. Open the "Permissions" tab: In the bucket details page, click on the "Permissions" tab to view the current IAM permissions for the bucket.

  4. Add a new member: Click on the "+ Add" button to add a new member to the bucket's IAM policy.

  5. Select the role: Select the role that you want to assign to the member. If you want to assign a custom role, click on "Select a role" and choose "Custom" to enter the role name that we created earlier.

  6. Save the changes: Click on the "Save" button to save the new IAM policy for the bucket.

Create IAM policy binding for the workspace service account and Kubernetes cluster

The workloadIdentityUser role requires the following permissions for authentication and interaction with the cluster:

  1. Navigate to IAM & Admin: Click on the menu icon in the top left corner of the console, then navigate to the "IAM & Admin" section and click on "IAM" to open the IAM & Admin page.

  2. Create a custom IAM role:

    • Click on the "Roles" tab.

    • Click on the "+ Create Role" button.

    • Enter a Role ID for your custom role (e.g., workloadIdentityUser).

    • Enter a Title for your custom role (e.g., e6data <E6DATA_WORKSPACE_NAME> workloadIdentityUser Access).

    • Enter a Description for your custom role (e.g., e6data custom workload identity user role).

    • Add the following permissions to your custom role:

      • iam.serviceAccounts.get

      • iam.serviceAccounts.getAccessToken

      • iam.serviceAccounts.getOpenIdToken

      • iam.serviceAccounts.list

    • Click on the "Create" button to create your custom IAM role.

  3. Bind the custom IAM role to a service account:

    • Click on the "IAM" tab.

    • Click on the "+ Add" button to add a new IAM policy binding.

    • Select your GCP project from the "Select a project" dropdown.

    • Enter the following in the "New Members" field: serviceAccount:<your_gcp_project_id>.svc.id.goog[<kubernetes_namespace>/<E6DATA_WORKSPACE_NAME>]

    • Select the custom role you created from the "Select a role" dropdown.

    • Click on the "Save" button to save the IAM policy binding.

Create IAM policy binding for Platform Service and Kubernetes cluster

The e6dataclusterViewer role requires the following permissions to monitor e6data cluster health:

  1. Navigate to IAM & Admin: Click on the menu icon in the top left corner of the console, then navigate to the "IAM & Admin" section and click on "IAM" to open the IAM & Admin page.

  2. Create a custom IAM role:

    • Click on the "Roles" tab.

    • Click on the "+ Create Role" button.

    • Enter a Role ID for your custom role (e.g., e6dataclusterViewer).

    • Enter a Title for your custom role (e.g., e6data-<E6DATA_WORKSPACE_NAME>-clusterViewer).

    • Enter a description of your custom role (e.g., kubernetes container clusterViewer access).

    • Add the following permissions to your custom role:

      • container.clusters.get

      • container.clusters.list

      • container.roleBindings.get

      • container.backendConfigs.get

      • container.backendConfigs.create

      • container.backendConfigs.delete

      • container.backendConfigs.update

      • resourcemanager.projects.get

      • compute.sslCertificates.get

      • compute.forwardingRules.list

    • Click on the "Create" button to create your custom IAM role.

  3. Create a custom IAM role:

    • Click on the "Roles" tab.

    • Click on the "+ Create Role" button.

    • Enter a Role ID for your custom role (e.g., targetPools role).

    • Enter a Title for your custom role (e.g., e6data-<E6DATA_WORKSPACE_NAME>-targetPools).

    • Enter a description of your custom role (e.g., kubernetes targetPools access).

    • Add the following permissions to your custom role:

      • compute.instances.get

      • compute.targetPools.get

      • compute.targetPools.list

    • Click on the "Create" button to create your custom IAM role.

  4. Create a custom IAM role:

    • Click on the "Roles" tab.

    • Click on the "+ Create Role" button.

    • Enter a Role ID for your custom role (e.g., global address role).

    • Enter a Title for your custom role (e.g., e6data-<E6DATA_WORKSPACE_NAME>-global_address).

    • Enter a description of your custom role (e.g., kubernetes global_address access).

    • Add the following permissions to your custom role:

      • compute.globalAddresses.delete

      • compute.globalAddresses.create

      • compute.globalAddresses.get

      • compute.globalAddresses.setLabels

    • Click on the "Create" button to create your custom IAM role.

  5. Create a custom IAM role:

    • Click on the "Roles" tab.

    • Click on the "+ Create Role" button.

    • Enter a Role ID for your custom role (e.g., Endpoints role).

    • Enter a Title for your custom role (e.g., e6data-<E6DATA_WORKSPACE_NAME>-security_policy).

    • Enter a description of your custom role (e.g., kubernetes security_policy access).

    • Add the following permissions to your custom role:

      • compute.securityPolicies.create

      • compute.securityPolicies.get

      • compute.securityPolicies.delete

      • compute.securityPolicies.update

    • Click on the "Create" button to create your custom IAM role.

  6. Bind the custom IAM role to a service account:

    • Click on the "IAM" tab.

    • Click on the "+ Add" button to add a new IAM policy binding.

    • Select your GCP project from the "Select a project" dropdown.

    • Enter the following in the "New Members" field: serviceAccount:<service-account-email>

    • Replace <service-account-email> with the email of the service account, you have to bind the role to.

    • Select the custom role you created (e.g., projects/<project-id>/roles/e6dataclusterViewer) from the "Select a role" dropdown.

    • Select the custom role you created (e.g., projects/<project-id>/roles/e6data-<E6DATA_WORKSPACE_NAME>-targetPools) from the "Select a role" dropdown.

    • Select the custom role you created (e.g., projects/<project-id>/roles/e6data-<E6DATA_WORKSPACE_NAME>-security_policy) from the "Select a role" dropdown and include a condition to limit access to resources named e6data. This condition could be formulated as follows:

      { "expression": "resource.name.startsWith(\"projects/<PROJECT_ID>/global/securityPolicies/e6data-\")", "title": "security policy condition", "description": "" }

    • Select the custom role you created (e.g., projects/<project-id>/roles/e6data-<E6DATA_WORKSPACE_NAME>-global_address) from the "Select a role" dropdown and include a condition to limit access to resources named e6data. This condition could be formulated as follows:

      { "expression": "resource.name.startsWith(\"projects/<PROJECT_ID>/global/addresses/e6data-\")", "title": "globaladdress policy condition", "description": "" }

    • Click on the "Save" button to save the IAM policy binding.

Create a GKE Nodepool

The GKE node pool represents a set of worker nodes within the GKE cluster, responsible for running the workload containers.

  1. Select your project: If you have multiple projects, select the project where you want to create the GKE node pool from the project selector at the top of the page.

  2. Go to Kubernetes Engine: In the left navigation menu, under the "Compute" section, click on "Kubernetes Engine" to access the Kubernetes Engine dashboard.

  3. Select your cluster: In the Kubernetes Engine dashboard, locate the cluster where you want to add the node pool and click on its name to open its details.

  4. Add a node pool: In the cluster details page, click on the "Add Node Pool" button to start the process of adding a new node pool to the cluster.

  5. Configure the node pool: Fill in the necessary details for the new node pool, including the name, machine type, disk size, and other configuration options. You can refer to your Terraform configuration for the values of these parameters.

Specify the properties as mentioned in the table below:

Property
Value
Description

Kubernetes Taints

Key=e6data-workspace-name, Value=<E6DATA_WORKSPACE_NAME> Effect=NoSchedule

Specifies Kubernetes taints for the nodes. Taints affect which pods can be scheduled on the nodes. Nodes with the taint "e6data-workspace-name=<E6DATA_WORKSPACE_NAME>:NoSchedule" will only allow pods that tolerate this taint to be scheduled on them. Nodes without this specific taint will not have any scheduling restrictions imposed by this taint.

labels

{"app" = "e6data" "e6data-workspace-name" = <E6DATA_WORKSPACE_NAME>

}

Key-value map of Kubernetes labels.

machine_type

c2-standard-30

Specifies instance type recommended by e6data.

autoscaling

enabled

total_min_node_count

0

Specifies the minimum number of nodes that should be maintained in the cluster. In this case, it's set to 0, meaning the cluster can scale down to no nodes if necessary

total_max_node_count

20

Sets the maximum number of nodes that the cluster can scale up to. Contact e6data support for help with sizing

location_policy

ANY

Instructs the cluster autoscaler to prioritize utilization of unused reservations and to account for current resource availability constraints (e.g. stock-outs).

Boot disk size

100

Sets the disk size in gigabytes (GB) for each node in the cluster. Atleast 100 GB is recommended by e6data.

Enable nodes on spot VMs

TRUE

GCE instance metadata

mode=GKE_METADATA

Please make note of the following parameters, they will be required when creating the Workspace in the e6data Console:

  • GKE Nodepool Name

  • GKE Nodepool Maximum Size

  • Kubernetes Namespace

Open the Google Cloud Console: Go to and log in to your Google Cloud account.

Enter the member's email address: Enter the email address of the service account that we created earlier “”

Open the Google Cloud Console: Go to and log in to your Google Cloud account.

Open the Google Cloud Console: Go to and log in to your Google Cloud account.

Navigate to the Google Cloud Console: Go to the Google Cloud Console at .

WARNING: Choosing SPOT may can cause unexpected downtime due to availability interruptions. Use for Workspaces containing non-critical workloads only. You can choose between ON_DEMAND or SPOT instances based on your cost and availability requirements.

https://console.cloud.google.com/
https://console.cloud.google.com/
https://console.cloud.google.com/
https://console.cloud.google.com/
e6data-service-account
More info.
Create buckets | Cloud Storage | Google Cloud
Add and manage node pools | Google Kubernetes Engine (GKE) | Google Cloud