Infrastructure & Permissions for e6data
Last updated
Last updated
The specific infrastructure and permissions required by e6data and instructions to create them are provided below:
The following infrastructure required to run e6data must be created before setup:
GKE Nodepool
Create a GKE node pool in an existing GKE cluster or a newly created GKE cluster for e6data.
GCS Bucket
To store e6data operational logs, cache & usage data.
A GCS bucket is required to store data required for the operation of the e6data workspace, eg: service logs, query results, state information, etc.
When creating a GCS bucket it is advisable to follow the GCP documentation.
Create buckets | Cloud Storage | Google Cloud
Please make note of the GCS Bucket Name, it will be required when creating the Workspace in the e6data Console.
The e6data Query Engine requires access to the GCS buckets containing the target data for querying. To provision the required access we need to create a custom role and associate it with a service account in GCP.
This configuration allows us to establish a secure connection between the Kubernetes environment and GCP. Once this IAM Role is associated with the service account, any Pods within the e6data clusters that are configured to use this service account will inherit the permissions defined in the IAM Role.
Create a custom role that grants write access to the "e6data" bucket for the "workspace" service account, which will be created in the next step:
Go to the Google Cloud Console and navigate to IAM & Admin > Roles.
Click on "Create role."
Enter a title and description for the role (e.g., "e6data Custom Role").
In the "Permissions" section, add the following permissions (replace [BUCKET_NAME] with the name of the e6data workspace bucket which we created earlier):
Go to the Google Cloud Console and navigate to IAM & Admin > Service accounts.
Click on "Create service account."
Enter a name and description for the service account (e.g., "e6data-service-account").
Click "Create" and then select the two custom roles for the service account that we created in the previous steps.
Click "Continue" and then "Done" to create the service account.
Create a Custom Role for read access to the data buckets to query:
Create a custom role that grants read access to the data buckets for the "workspace" service account to query.
Go to the Google Cloud Console and navigate to IAM & Admin > Roles.
Click on "Create role."
Enter a title and description for the role (e.g., "e6data Custom Role").
In the "Permissions" section, add the following permissions (replace [BUCKET_NAME]
with the name of the e6data workspace bucket which we created earlier):
Attach the read permission to the buckets which need to be queried:
Open the Google Cloud Console: Go to https://console.cloud.google.com/ and log in to your Google Cloud account.
Navigate to the Cloud Storage section: Click on the menu icon in the top left corner of the console, then navigate to the "Storage" section and click on "Storage" to open the Cloud Storage browser.
Select a bucket: Click on the bucket to which you want to assign the IAM role.
Open the "Permissions" tab: In the bucket details page, click on the "Permissions" tab to view the current IAM permissions for the bucket.
Add a new member: Click on the "+ Add" button to add a new member to the bucket's IAM policy.
Enter the member's email address: Enter the email address of the service account that we created earlier “e6data-service-account”
Select the role: Select the role that you want to assign to the member. If you want to assign a custom role, click on "Select a role" and choose "Custom" to enter the role name that we created earlier.
Save the changes: Click on the "Save" button to save the new IAM policy for the bucket.
The workloadIdentityUser role requires the following permissions for authentication and interaction with the cluster:
Open the Google Cloud Console: Go to https://console.cloud.google.com/ and log in to your Google Cloud account.
Navigate to IAM & Admin: Click on the menu icon in the top left corner of the console, then navigate to the "IAM & Admin" section and click on "IAM" to open the IAM & Admin page.
Create a custom IAM role:
Click on the "Roles" tab.
Click on the "+ Create Role" button.
Enter a Role ID for your custom role (e.g., workloadIdentityUser
).
Enter a Title for your custom role (e.g., e6data <E6DATA_WORKSPACE_NAME> workloadIdentityUser Access
).
Enter a Description for your custom role (e.g., e6data custom workload identity user role
).
Add the following permissions to your custom role:
iam.serviceAccounts.get
iam.serviceAccounts.getAccessToken
iam.serviceAccounts.getOpenIdToken
iam.serviceAccounts.list
Click on the "Create" button to create your custom IAM role.
Bind the custom IAM role to a service account:
Click on the "IAM" tab.
Click on the "+ Add" button to add a new IAM policy binding.
Select your GCP project from the "Select a project" dropdown.
Enter the following in the "New Members" field: serviceAccount:<your_gcp_project_id>.svc.id.goog[<kubernetes_namespace>/<E6DATA_WORKSPACE_NAME>]
Select the custom role you created from the "Select a role" dropdown.
Click on the "Save" button to save the IAM policy binding.
The e6dataclusterViewer role requires the following permissions to monitor e6data cluster health:
Open the Google Cloud Console: Go to https://console.cloud.google.com/ and log in to your Google Cloud account.
Navigate to IAM & Admin: Click on the menu icon in the top left corner of the console, then navigate to the "IAM & Admin" section and click on "IAM" to open the IAM & Admin page.
Create a custom IAM role:
Click on the "Roles" tab.
Click on the "+ Create Role" button.
Enter a Role ID for your custom role (e.g., e6dataclusterViewer
).
Enter a Title for your custom role (e.g., e6data-<E6DATA_WORKSPACE_NAME>-clusterViewer
).
Enter a description of your custom role (e.g., kubernetes container clusterViewer access
).
Add the following permissions to your custom role:
container.clusters.get
container.clusters.list
container.roleBindings.get
container.backendConfigs.get
container.backendConfigs.create
container.backendConfigs.delete
container.backendConfigs.update
resourcemanager.projects.get
compute.sslCertificates.get
compute.forwardingRules.list
Click on the "Create" button to create your custom IAM role.
Create a custom IAM role:
Click on the "Roles" tab.
Click on the "+ Create Role" button.
Enter a Role ID for your custom role (e.g., targetPools role
).
Enter a Title for your custom role (e.g., e6data-<E6DATA_WORKSPACE_NAME>-targetPools
).
Enter a description of your custom role (e.g., kubernetes targetPools access
).
Add the following permissions to your custom role:
compute.instances.get
compute.targetPools.get
compute.targetPools.list
Click on the "Create" button to create your custom IAM role.
Create a custom IAM role:
Click on the "Roles" tab.
Click on the "+ Create Role" button.
Enter a Role ID for your custom role (e.g., global address role
).
Enter a Title for your custom role (e.g., e6data-<E6DATA_WORKSPACE_NAME>-global_address
).
Enter a description of your custom role (e.g., kubernetes global_address access
).
Add the following permissions to your custom role:
compute.globalAddresses.delete
compute.globalAddresses.create
compute.globalAddresses.get
compute.globalAddresses.setLabels
Click on the "Create" button to create your custom IAM role.
Create a custom IAM role:
Click on the "Roles" tab.
Click on the "+ Create Role" button.
Enter a Role ID for your custom role (e.g., Endpoints role
).
Enter a Title for your custom role (e.g., e6data-<E6DATA_WORKSPACE_NAME>-security_policy
).
Enter a description of your custom role (e.g., kubernetes security_policy access
).
Add the following permissions to your custom role:
compute.securityPolicies.create
compute.securityPolicies.get
compute.securityPolicies.delete
compute.securityPolicies.update
Click on the "Create" button to create your custom IAM role.
Bind the custom IAM role to a service account:
Click on the "IAM" tab.
Click on the "+ Add" button to add a new IAM policy binding.
Select your GCP project from the "Select a project" dropdown.
Enter the following in the "New Members" field: serviceAccount:<service-account-email>
Replace <service-account-email>
with the email of the service account, you have to bind the role to.
Select the custom role you created (e.g., projects/<project-id>/roles/e6dataclusterViewer
) from the "Select a role" dropdown.
Select the custom role you created (e.g., projects/<project-id>/roles/e6data-<E6DATA_WORKSPACE_NAME>-targetPools
) from the "Select a role" dropdown.
Select the custom role you created (e.g., projects/<project-id>/roles/e6data-<E6DATA_WORKSPACE_NAME>-security_policy
) from the "Select a role" dropdown and include a condition to limit access to resources named e6data. This condition could be formulated as follows:
{
"expression": "resource.name.startsWith(\"projects/<PROJECT_ID>/global/securityPolicies/e6data-\")",
"title": "security policy condition",
"description": ""
}
Select the custom role you created (e.g., projects/<project-id>/roles/e6data-<E6DATA_WORKSPACE_NAME>-global_address
) from the "Select a role" dropdown and include a condition to limit access to resources named e6data. This condition could be formulated as follows:
{
"expression": "resource.name.startsWith(\"projects/<PROJECT_ID>/global/addresses/e6data-\")",
"title": "globaladdress policy condition",
"description": ""
}
Click on the "Save" button to save the IAM policy binding.
The GKE node pool represents a set of worker nodes within the GKE cluster, responsible for running the workload containers.
Navigate to the Google Cloud Console: Go to the Google Cloud Console at https://console.cloud.google.com/.
Select your project: If you have multiple projects, select the project where you want to create the GKE node pool from the project selector at the top of the page.
Go to Kubernetes Engine: In the left navigation menu, under the "Compute" section, click on "Kubernetes Engine" to access the Kubernetes Engine dashboard.
Select your cluster: In the Kubernetes Engine dashboard, locate the cluster where you want to add the node pool and click on its name to open its details.
Add a node pool: In the cluster details page, click on the "Add Node Pool" button to start the process of adding a new node pool to the cluster.
Configure the node pool: Fill in the necessary details for the new node pool, including the name, machine type, disk size, and other configuration options. You can refer to your Terraform configuration for the values of these parameters.
Specify the properties as mentioned in the table below:
Please make note of the following parameters, they will be required when creating the Workspace in the e6data Console:
GKE Nodepool Name
GKE Nodepool Maximum Size
Kubernetes Namespace
Property | Value | Description |
---|---|---|
Kubernetes Taints
Key=e6data-workspace-name,
Value=<E6DATA_WORKSPACE_NAME>
Effect=NoSchedule
Specifies Kubernetes taints for the nodes. Taints affect which pods can be scheduled on the nodes. Nodes with the taint "e6data-workspace-name=<E6DATA_WORKSPACE_NAME>
:NoSchedule" will only allow pods that tolerate this taint to be scheduled on them. Nodes without this specific taint will not have any scheduling restrictions imposed by this taint.
labels
{"app" = "e6data"
"e6data-workspace-name" = <E6DATA_WORKSPACE_NAME>
}
Key-value map of Kubernetes labels.
machine_type
c2-standard-30
Specifies instance type recommended by e6data.
autoscaling
enabled
total_min_node_count
0
Specifies the minimum number of nodes that should be maintained in the cluster. In this case, it's set to 0, meaning the cluster can scale down to no nodes if necessary
total_max_node_count
20
Sets the maximum number of nodes that the cluster can scale up to. Contact e6data support for help with sizing
location_policy
ANY
Instructs the cluster autoscaler to prioritize utilization of unused reservations and to account for current resource availability constraints (e.g. stock-outs).
Boot disk size
100
Sets the disk size in gigabytes (GB) for each node in the cluster. Atleast 100 GB is recommended by e6data.
Enable nodes on spot VMs
TRUE
WARNING: Choosing SPOT may can cause unexpected downtime due to availability interruptions. Use for Workspaces containing non-critical workloads only. More info. You can choose between ON_DEMAND or SPOT instances based on your cost and availability requirements.
mode=GKE_METADATA