Setup using Terraform in AZURE

Deploying e6data Workspace in Microsoft Azure using Terraform

In this documentation, we will walk you through the process of deploying e6data on AZURE using Terraform.

Deploying e6data in AZURE using Terraform

Terraform is an open-source infrastructure-as-code tool developed by HashiCorp. It allows you to define and manage your infrastructure in a declarative way, making it easier to provision and manage resources across various cloud providers, including Azure.

Prerequisites

Before you begin, ensure that you have the following prerequisites in place:

  1. An AZURE account with appropriate permissions to create and manage resources.

  2. A local development environment with Terraform installed. The installation steps are outlined in the next section.

Creating a New Azure AKS Cluster (Skip if You Already Have an AKS Cluster)

  1. Ensure you have the Azure CLI installed and configured on your local machine. If you haven't installed it yet, please follow the instructions at https://docs.microsoft.com/cli/azure/install-azure-cli to set it up.

  2. Open a terminal or command prompt.

  3. Run the following command to create a new AKS cluster:

    az aks create --resource-group [RESOURCE_GROUP] --name [CLUSTER_NAME] --zones 1 2 3 --tier standard --enable-oidc-issuer --enable-aad --aad-admin-group-object-ids [AAD_ADMIN_GROUP_OBJECT_IDS]
  • Replace the placeholders with the appropriate values:

    • --resource-group [RESOURCE_GROUP]: Specify the name of the Azure resource group where you want to create the AKS cluster. Replace [RESOURCE_GROUP] with the desired resource group name.

    • --name [CLUSTER_NAME]: Specify the name for your AKS cluster. Replace [CLUSTER_NAME] with your desired cluster name.

    • --zones 1 2 3: Set the availability zones for your cluster. This parameter ensures that your cluster is distributed across multiple fault domains. In this example, the zones are set to 1, 2, and 3. You can adjust these values based on your requirements.

    • --tier standard: Specify the tier for your AKS cluster. The standard tier provides a balance between cost and functionality.

    • --enable-oidc-issuer: Enable OpenID Connect (OIDC) issuer integration. This allows you to authenticate users using an OIDC provider.

    • --enable-aad: Enable Azure Active Directory (AAD) integration. This enables authentication and authorization using AAD.

    • --aad-admin-group-object-ids [AAD_ADMIN_GROUP_OBJECT_IDS]: Replace this with the object IDs of the AAD groups that will have admin permissions on the AKS cluster.

    • For detailed instructions and more advanced configurations, you can refer to the official Azure documentation on https://learn.microsoft.com/en-us/azure/aks/learn/quick-kubernetes-deploy-cli .

Note: If you haven't already configured Azure AD groups for AKS RBAC, you can refer to the following link for instructions: Configuring groups for Azure AKS with Azure AD RBAC. This will guide you in setting up and managing Azure AD groups for role-based access control within your AKS cluster.

  1. Wait for the cluster creation process to complete. This may take some time.

  2. Once the AKS cluster is created, you can retrieve the connection information by running the following command:

az aks get-credentials --resource-group [RESOURCE_GROUP] --name [CLUSTER_NAME]

Replace [RESOURCE_GROUP] and [CLUSTER_NAME] with the appropriate values. This command will configure the kubectl command-line tool to connect to the AKS cluster.

  1. Verify the connection to the AKS cluster by running the following command:

kubectl get nodes

This should display the list of nodes in your AKS cluster.

Congratulations! You have successfully created a new Azure AKS cluster. Using the terraform script, you can now deploy and manage e6data workspace on this cluster.

Installing Terraform

To install Terraform on your local machine, you can follow the steps provided in the official HashiCorp Terraform documentation:

  1. Visit the official Terraform website at https://www.terraform.io

  2. Navigate to the "Downloads" page or click here to directly access the downloads page.

  3. Download the appropriate package for your operating system (e.g., Windows, macOS, Linux).

  4. Extract the downloaded package to a directory of your choice.

  5. Add the Terraform executable to your system's PATH environment variable.

  • For Windows:

    • Open the Start menu and search for "Environment Variables."

    • Select "Edit the system environment variables."

    • Click the "Environment Variables" button.

    • Under "System variables," find the "Path" variable and click "Edit."

    • Add the path to the directory where you extracted the Terraform executable (e.g., C:\\terraform) to the list of paths.

    • Click "OK" to save the changes.

  • For macOS and Linux:

    • Open a terminal window.

    • Run the following command, replacing <path_to_extracted_binary> with the path to the directory where you extracted the Terraform executable:

export PATH=$PATH:<path_to_extracted_binary>
  • Optionally, you can add this command to your shell's profile file (e.g., ~/.bash_profile, ~/.bashrc, ~/.zshrc) to make it persistent across terminal sessions.

  1. Verify the installation by opening a new terminal window and running the following command:

terraform version

If Terraform is installed correctly, you should see the version number displayed.

AZURE Terraform Provider for Authentication

The Azure Provider can be used to configure infrastructure in Microsoft Azure using the Azure Resource Manager API's.

Provider Block

terraform {
  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "=3.0.0"
    }
  }
}

# Configure the Microsoft Azure Provider
provider "azurerm" {
  features {}
  subscription_id   = {{your_subscription_id}}
}

Authentication and Configuration

Terraform supports a number of different methods for authenticating to Azure:

  • Authenticating to Azure using the Azure CLI The Azure CLI (Command-Line Interface) provides a convenient way to authenticate Terraform to Azure. By running the az login command and following the authentication flow, Terraform can use the credentials provided by the Azure CLI to access Azure resources.

  • Authenticating to Azure using Managed Service Identity Managed Service Identity (MSI) allows applications or services running on Azure to authenticate without needing explicit credentials. Terraform can leverage MSI to authenticate itself and access Azure resources without the need for additional authentication configuration.

  • Authenticating to Azure using a Service Principal and a Client Certificate A Service Principal is an identity that can be used by applications, services, or automation tools like Terraform to access Azure resources. This method involves creating a Service Principal and associating a client certificate with it. Terraform can then use the Service Principal and client certificate for authentication.

  • Authenticating to Azure using a Service Principal and a Client Secret Similar to the previous method, this authentication approach involves creating a Service Principal, but instead of a client certificate, a client secret is used. The client secret is essentially a password associated with the Service Principal. Terraform can utilize the Service Principal and client secret for authentication to Azure.

  • Authenticating to Azure using a Service Principal and OpenID Connect OpenID Connect (OIDC) is an authentication protocol that allows clients, such as Terraform, to verify the identity of users or services. This method involves creating a Service Principal and configuring an OIDC identity provider. Terraform can then authenticate using the Service Principal and OIDC to access Azure resources.

Specifying AWS S3 Bucket for Terraform State file

Utilizing an Azure Blob for Terraform state storage provides a reliable and scalable solution for managing the infrastructure state on AZURE.

To specify an Azure Storage Account for storing the Terraform state when using Azure as the provider, you can add the following configuration to the Terraform script:

terraform {
  backend "azurerm" {
    resource_group_name  = "<Resource_group_of_the_storage_Account>"
    storage_account_name = "<storage_account_name>"
    container_name       = "<container_name>"
    key                  = "terraform.tfstate"
  }
}

We are using the azurerm provider and configuring the backend to use an Azure Storage Account for storing the Terraform state. Replace <storage_account_name> with the name of the Azure Storage Account you want to use, and <container_name> with the name of the container within the storage account where you want to store the state file.

The key parameter specifies the name of the state file within the container. It is set to "terraform.tfstate", but you can adjust it according to your needs.

Ensure that the Azure credentials used for authenticating Terraform have the appropriate permissions to read from and write to the specified Azure Storage Account and container.

Note:

  • Before configuring the backend, make sure you have already created the Azure Storage Account and container in the desired Azure subscription and resource group.

  • For more information and to explore additional backend options, you can refer to the Terraform Backend Configuration documentation.

Deployment Overview and Resource Provisioning

This section provides a comprehensive overview of the resources deployed using the Terraform script for the e6data workspace deployment.

  1. AKS Node Pool: Creates a dedicated node pool in Azure Kubernetes Service (AKS) to host the e6data workspace. The node pool is configured with autoscaling to dynamically adjust the number of nodes based on workload demand, ensuring scalability.

  2. Blob Storage Container: Creates a dedicated storage account and blob container in Azure to store the query results for the e6data workspace. The storage account and container provide a reliable and scalable solution for storing and accessing the query output data.

  3. App Registration and client secret: An app registration and associated service principal will be created to grant read and write access to the previously created Azure Blob Storage container. Additionally, minimum permissions will be assigned to the service principal to establish a secure connection to the Azure Kubernetes Service (AKS). Furthermore, a secret will be generated to facilitate secure authentication and authorization for the e6data platform. This comprehensive setup will allow the application to seamlessly interact with the Blob Storage container, securely connect to the AKS cluster, and utilize the generated secret for integration with the e6data platform.

  4. Managed Identity with federated credentials: To establish secure authentication and access control within the AKS cluster, a managed identity will be created and associated with federated credentials using the provided OIDC issuer URL. This managed identity will be granted "Storage Blob Data Contributor" access to the previously created Blob Storage container, enabling read and write operations. Additionally, the managed identity will be assigned "Storage Blob Data Reader" permission for the buckets specified in the tfvars file which contains the data used by the e6data engine. This permission allows the AKS cluster to securely read data from the designated buckets without the ability to modify or write to them. Overall, this setup ensures controlled access and facilitates seamless interaction between the AKS cluster, managed identity, and the necessary data resources.

  5. Helm chart with Service Account: The deployment of a Helm chart to your AKS cluster plays a crucial role in configuring the federated credentials acquired from the user-assigned managed identity. The purpose of this Helm chart is to establish a seamless integration between the AKS cluster and the federated credentials. As part of this process, a service account is created within the AKS cluster. This service account is specifically associated with the configured federated credentials, enabling secure and authorized access to the storage resources using the managed identity.

Configuration Variables in terraform.tfvars File

The terraform.tfvars file contains the following variables that need to be configured before executing the terraform:

# General configuration
prefix                          = "e6data"                       # Prefix for resources
region                          = "eastus"                       # Azure region
workspace_name                  = "e6workspace"                  # Name of the e6data workspace to be created

# AKS cluster details
subscription_id                 = "abcdh100000-abcd-efgh-0000-000000000"  # Subscription ID of AZURE subscription
aks_resource_group_name         = "e6datarg"                    # Resource group name for AKS cluster
aks_cluster_name                = "poc"                         # AKS cluster name
kube_version                    = "1.28"                        # Kubernetes version
kubernetes_namespace            = "e6data"                      # Namespace to deploy e6data workspace
private_cluster_enabled         = "false"                       # Private cluster enabled (true/false)

# Networking
cidr_block                      = ["10.210.0.0/16"]             # CIDR block for the VNet

# Node pool configuration
nodepool_instance_family        = ["D", "F"]                    # Instance families for node pools
priority                        = ["spot"]                      # VM priority (Regular or Spot)

# Application secrets
e6data_app_secret_expiration_time = "2400h"                     # Expiration time for application secret

# Data storage configuration
data_storage_account_name       = "databucket"                  # Storage account name
data_resource_group_name        = "data-rg"                     # Resource group for storage account
list_of_containers              = ["*"]                         # Containers to access in storage account

# Helm chart version
helm_chart_version              = "2.0.8"                       # Helm chart version for e6data workspace

# Cost allocation tags
cost_tags = {
  App = "e6data"
}

# Default Node pool variables
default_node_pool_vm_size       = "Standard_B2s"
default_node_pool_node_count    = 2
default_node_pool_name          = "default"

# Karpenter Variables
karpenter_namespace             = "kube-system"                 # Namespace for Karpenter deployment
karpenter_service_account_name  = "karpenter"                   # Service account name for Karpenter
karpenter_release_version       = "0.5.0"                       # Karpenter release version

Please update the values of these variables in the terraform.tfvars file to match the specific configuration details for your environment:

General Configuration

  • prefix: A prefix used to name various Azure resources. Ensures consistent and easily identifiable naming conventions across all resources.

  • region : Specifies the Azure region where the resources will be deployed.

  • workspace_name : The name assigned to the e6data workspace. This is the primary workspace for your data operations and resource management.

AKS Cluster Details

  • subscription_id: The subscription ID of the Azure subscription in which the e6data resources will be deployed.

  • aks_resource_group_name : The name of the resource group where the AKS (Azure Kubernetes Service) cluster will be deployed. Resource groups help in organizing and managing related resources.

  • aks_cluster_name: The name of your Azure Kubernetes Service (AKS) cluster to deploy e6data workspace.

  • kube_version: The version of Kubernetes to be deployed on the AKS cluster. Ensures compatibility and access to specific Kubernetes features.

  • kubernetes_namespace: The namespace within Kubernetes where the e6data workspace will be deployed. Namespaces provide a mechanism to isolate and manage resources within a Kubernetes cluster.

  • private_cluster_enabled: A boolean setting that determines whether the AKS cluster is private (accessible only within a private network) or public.

Networking

  • cidr_block: The CIDR (Classless Inter-Domain Routing) block for the Virtual Network (VNet) in Azure. Defines the IP address range for the VNet.

Node Pool Configuration

  • nodepool_instance_family: Specifies the Azure VM instance families to be used for the node pools. Helps in selecting appropriate VM types for different workload requirements.

  • priority: Defines the priority for the VMs in the node pool. Options are "Regular" for standard VMs or "Spot" for spot instances, which are more cost-effective but can be preempted.

Application Secrets

  • e6data_app_secret_expiration_time: The expiration time for application secrets. Ensures that secrets are rotated regularly to maintain security.

Data Storage Configuration

  • data_storage_account_name: The name of the Azure Storage Account where data will be stored. Provides a unique identifier for the storage resource.

  • data_resource_group_name: The resource group where the storage account is located. Helps in organizing and managing storage resources.

  • list_of_containers: A list of containers within the storage account that the application will access. Using "*" allows access to all containers.

Helm Chart Version

  • helm_chart_version: Specifies the version of the Helm chart to be used for deploying the e6data workspace. Ensures consistency and compatibility with the deployment.

Cost Allocation Tags

  • cost_tags: Tags used for cost allocation and management. Helps in tracking and optimizing resource costs. Here, the tag "App" is set to "e6data."

Default Node Pool Variables

  • default_node_pool_vm_size: The size of the VMs to be used in the default node pool. Specifies the VM SKU, in this case, "Standard_B2s."

  • default_node_pool_node_count: The number of nodes in the default node pool. Sets the initial scale of the node pool.

  • default_node_pool_name: The name assigned to the default node pool. Helps in identifying the node pool within the AKS cluster.

Karpenter Variables

  • karpenter_namespace: The namespace within Kubernetes where Karpenter, the cluster autoscaler, will be deployed.

  • karpenter_service_account_name: The name of the service account for Karpenter. Provides necessary permissions for Karpenter to manage cluster resources.

  • karpenter_release_version: The version of Karpenter to be deployed. Ensures compatibility and access to specific features of Karpenter.

Execution Commands

Once you have configured the necessary variables in the terraform.tfvars file, you can proceed with the execution of the Terraform script to deploy the e6data workspace. Follow the steps below to initiate the deployment:

  1. Navigate to the directory containing the Terraform files. It is essential to be in the correct directory for the Terraform commands to execute successfully.

  2. Initialize Terraform:

terraform init
  1. Generate a Terraform plan and save it to a file (e.g., e6.plan):

terraform plan -var-file="terraform.tfvars" --out="e6.plan"

The -var-file flag specifies the input variable file (terraform.tfvars) that contains the necessary configuration values for the deployment.

  1. Review the generated plan.

  2. Apply the changes using the generated plan file:

terraform apply "e6.plan"

This command applies the changes specified in the plan file (e6.plan) to deploy e6data workspace in your environment.

  1. After successfully applying the Terraform changes and deploying the e6data workspace, you can retrieve the values of the secret, application_id, and tenant_id by running the below commands.

terraform output secret
terraform output application_id
terraform output tenant_id

These commands will display the output values defined in your Terraform configuration. These are the values you need to update in the e6data console.

Last updated

Change request #930: Cross account hive GCP