Setup using Terraform in AZURE

Deploying e6data Workspace in Microsoft Azure using Terraform

In this documentation, we will walk you through the process of deploying e6data on AZURE using Terraform.

Deploying e6data in AZURE using Terraform

Terraform is an open-source infrastructure-as-code tool developed by HashiCorp. It allows you to define and manage your infrastructure in a declarative way, making it easier to provision and manage resources across various cloud providers, including Azure.

Prerequisites

Before you begin, ensure that you have the following prerequisites in place:

  1. An AZURE account with appropriate permissions to create and manage resources.

  2. A local development environment with Terraform installed. The installation steps are outlined in the next section.

Create the e6data Workspace

  1. Login to the e6data Console

  2. Navigate to Workspaces in the left-side navigation bar or click Create Workspace

  3. Select AZURE as the Cloud Provider.

  4. In the e6data UI, after selecting Azure, copy the Cognito Identity ID and Cognito Pool ID.

  5. Proceed to the next step to deploy the prerequisite resources using terraform.

Setup e6data

Using the Terraform script, the e6data Workspace will be deployed inside an Azure AKS Cluster. The subsequent sections will provide instructions to edit two Terraform files required for the deployment:

  1. provider.tf

  2. terraform.tfvars

If an Azure AKS cluster is not available, please follow these instructions.

If Terraform is not installed, please follow these instructions.

Download e6data Terraform Scripts

Please download/clone the e6x-labs/terraform repo from Github.

Configure provider.tf

The AZURE provider in Terraform allows you to manage AZURE resources efficiently. However, before utilizing the provider, it's crucial to configure it with the appropriate credentials.

Extract the scripts downloaded in the previous step and navigate to the _workspace folder.

Edit the provider.tf file according to your requirements. Please refer to the official Terraform documentation to find instructions to use the authentication method most appropriate to your environment.

sample provider.tf
terraform {
  backend "azurerm" {
    resource_group_name  = "<resource_group_name>"
    storage_account_name = "<storage_account_name>"
    container_name       = "<container_name>"
    key                  = "terraform.tfstate"
  }

  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "3.110.0"
    }
    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = "2.23.0"
    }
    kubectl = {
      source  = "alekc/kubectl"
      version = "2.0.4"
    }
  }
}

provider "azurerm" {
  features {}
  subscription_id   = "<subscription_id>"
}

Specifying Azure Storage account for Terraform State file

Utilizing an Azure Blob for Terraform state storage provides a reliable and scalable solution for managing the infrastructure state on AZURE.

We are using the azurerm provider and configuring the backend to use an Azure Storage Account for storing the Terraform state. Replace <resource_group_name> and <storage_account_name> with the names of the resource group and Azure Storage Account you want to use, and <container_name> with the name of the container within the storage account where you want to store the state file.

The key parameter specifies the name of the state file within the container. It is set to "terraform.tfstate", but you can adjust it according to your needs.

Ensure that the Azure credentials used for authenticating Terraform have the appropriate permissions to read from and write to the specified Azure Storage Account and container.

Note:

  • Before configuring the backend, make sure you have already created the Azure Storage Account and container in the desired Azure subscription and resource group.

  • For more information and to explore additional backend options, you can refer to the Terraform Backend Configuration documentation.

Configuration Variables in terraform.tfvars File

The terraform.tfvars file contains the following variables that need to be configured before executing the terraform:

Sample terraform.tfvars
# General configuration
prefix                          = "<prefix>"                       # Prefix for resources
region                          = "<region>"                       # Azure region
workspace_name                  = "<workspace_name>"               # Name of the e6data workspace to be created

# AKS cluster details
subscription_id                 = "<subscription_id>"              # Subscription ID of Azure subscription
aks_resource_group_name         = "<aks_resource_group_name>"      # Resource group name for AKS cluster
aks_cluster_name                = "<aks_cluster_name>"             # AKS cluster name
kube_version                    = "1.30"                           # Kubernetes version
kubernetes_namespace            = "<kubernetes_namespace>"         # Namespace to deploy e6data workspace
private_cluster_enabled         = "false"                          # Private cluster enabled (true/false)

# Networking
cidr_block                      = ["10.220.0.0/16"]                # CIDR block for the VNet

# Node pool configuration
nodepool_instance_family        = ["D", "F"]                       # Instance families for node pools
nodepool_instance_arch          = ["arm64"]                        # Instance architecture for node pools
priority                        = ["spot"]                         # VM priority (Regular or Spot)

# Data storage configuration
data_storage_account_name       = "<data_storage_account_name>"    # Storage account name
data_resource_group_name        = "<data_resource_group_name>"     # Resource group for storage account
list_of_containers              = ["*"]                            # Containers to access in storage account

# Helm chart version
helm_chart_version              = "2.0.9"                          # Helm chart version for e6data workspace

# Cost allocation tags
cost_tags = {
  App = "e6data"
}

# Default Node pool variables
default_node_pool_vm_size       = "Standard_B2s"                   # VM size for the default node pool
default_node_pool_node_count    = 3                                # Number of nodes in the default node pool
default_node_pool_name          = "default"                        # Name of the default node pool

# Identity Pool Variables
identity_pool_id                = "<identity_pool_id>"             # The identity pool ID available in the e6data console after clicking on the "Create Workspace" button and selecting AZURE
identity_id                     = "<identity_id>"                  # The identity ID available in the e6data console, used for authentication and authorization in the workspace

# Karpenter Variables
karpenter_namespace             = "kube-system"                    # Namespace for Karpenter deployment
karpenter_service_account_name  = "karpenter"                      # Service account name for Karpenter
karpenter_release_version       = "0.6.0"                          # Karpenter release version

# Key Vault Configuration
key_vault_name                  = ""                               # Please provide the Key Vault name in which the certificate for the domain is present. If left blank, a new Key Vault will be created in the AKS resource group.
key_vault_rg_name               = ""                               # The resource group for the specified Key Vault. If left blank, it will default to the AKS resource group.

# Nginx Ingress Controller Configuration
nginx_ingress_controller_namespace = "kube-system"                # Namespace where the Nginx Ingress Controller will be deployed
nginx_ingress_controller_version   = "4.7.1"                      # Version of the Nginx Ingress Controller to be installed

Please update the values of these variables in the terraform.tfvars file to match the specific configuration details for your environment:

prefix

Prefix for resources

region

Azure region

workspace_name

Name of the e6data workspace to be created

subscription_id

Subscription ID of Azure subscription

aks_resource_group_name

Resource group name for AKS cluster

aks_cluster_name

AKS cluster name

kube_version

Kubernetes version

kubernetes_namespace

Namespace to deploy e6data workspace

private_cluster_enabled

Private cluster enabled (true/false)

cidr_block

CIDR block for the VNet

nodepool_instance_family

Instance families for node pools

nodepool_instance_arch

Instance architecture for node pools

priority

VM priority (Regular or Spot)

data_storage_account_name

Storage account name

data_resource_group_name

Resource group for storage account

list_of_containers

Containers to access in storage account

helm_chart_version

Helm chart version for e6data workspace

cost_tags

Tags used for cost allocation and management. Helps in tracking and optimizing resource costs. Here, the tag "App" is set to "e6data."

default_node_pool_vm_size

VM size for the default node pool

default_node_pool_node_count

Number of nodes in the default node pool

default_node_pool_name

Name of the default node pool

identity_pool_id

The identity pool ID available in the e6data console after clicking on the "Create Workspace" button and selecting AZURE

identity_id

The identity ID available in the e6data console, used for authentication and authorization in the workspace

karpenter_namespace

Namespace for Karpenter deployment

karpenter_service_account_name

Service account name for Karpenter

karpenter_release_version

Karpenter release version

key_vault_name

Please provide the Key Vault name in which the certificate for the domain is present. If left blank, a new Key Vault will be created in the AKS resource group.

key_vault_rg_name

The resource group for the specified Key Vault. If left blank, it will default to the AKS resource group.

nginx_ingress_controller_namespace

Namespace where the Nginx Ingress Controller will be deployed

nginx_ingress_controller_version

Version of the Nginx Ingress Controller to be installed

Execution Commands

Once you have configured the necessary variables in the terraform.tfvars file, you can proceed with the execution of the Terraform script to deploy the e6data workspace. Follow the steps below to initiate the deployment:

  1. Navigate to the directory containing the Terraform files. It is essential to be in the correct directory for the Terraform commands to execute successfully.

  2. Initialize Terraform:

terraform init
  1. Generate a Terraform plan and save it to a file (e.g., e6.plan):

terraform plan -var-file="terraform.tfvars" --out="e6.plan"

The -var-file flag specifies the input variable file (terraform.tfvars) that contains the necessary configuration values for the deployment.

  1. Review the generated plan.

  2. Apply the changes using the generated plan file:

terraform apply "e6.plan"

This command applies the changes specified in the plan file (e6.plan) to deploy e6data workspace in your environment.

  1. After successfully applying the Terraform changes and deploying the e6data workspace, you can retrieve the values of the secret, application_id, and tenant_id by running the below commands.

terraform output secret
terraform output application_id
terraform output tenant_id

These commands will display the output values defined in your Terraform configuration. These are the values you need to update in the e6data console.

Deployment Overview and Resource Provisioning

This section provides a comprehensive overview of the resources deployed using the Terraform script for the e6data workspace deployment.

  1. AKS Node Pool: Creates a dedicated node pool in Azure Kubernetes Service (AKS) to host the e6data workspace. The node pool is configured with autoscaling to dynamically adjust the number of nodes based on workload demand, ensuring scalability.

  2. Blob Storage Container: Creates a dedicated storage account and blob container in Azure to store the query results for the e6data workspace. The storage account and container provide a reliable and scalable solution for storing and accessing the query output data.

  3. App Registration and client secret: An app registration and associated service principal will be created to grant read and write access to the previously created Azure Blob Storage container. Additionally, minimum permissions will be assigned to the service principal to establish a secure connection to the Azure Kubernetes Service (AKS). Furthermore, a secret will be generated to facilitate secure authentication and authorization for the e6data platform. This comprehensive setup will allow the application to seamlessly interact with the Blob Storage container, securely connect to the AKS cluster, and utilize the generated secret for integration with the e6data platform.

  4. Managed Identity with federated credentials: To establish secure authentication and access control within the AKS cluster, a managed identity will be created and associated with federated credentials using the provided OIDC issuer URL. This managed identity will be granted "Storage Blob Data Contributor" access to the previously created Blob Storage container, enabling read and write operations. Additionally, the managed identity will be assigned "Storage Blob Data Reader" permission for the buckets specified in the tfvars file which contains the data used by the e6data engine. This permission allows the AKS cluster to securely read data from the designated buckets without the ability to modify or write to them. Overall, this setup ensures controlled access and facilitates seamless interaction between the AKS cluster, managed identity, and the necessary data resources.

  5. Helm chart with Service Account: The deployment of a Helm chart to your AKS cluster plays a crucial role in configuring the federated credentials acquired from the user-assigned managed identity. The purpose of this Helm chart is to establish a seamless integration between the AKS cluster and the federated credentials. As part of this process, a service account is created within the AKS cluster. This service account is specifically associated with the configured federated credentials, enabling secure and authorized access to the storage resources using the managed identity.

Last updated