LogoLogo
  • Welcome to e6data
  • Introduction to e6data
    • Concepts
    • Architecture
      • e6data in VPC Deployment Model
      • Connect to e6data serverless compute
  • Get Started
  • Sign Up
  • Setup
    • AWS Setup
      • In VPC Deployment (AWS)
        • Prerequisite Infrastructure
        • Infrastructure & Permissions for e6data
        • Setup Kubernetes Components
        • Setup using Terraform in AWS
          • Update a AWS Terraform for your Workspace
        • AWS PrivateLink and e6data
        • VPC Peering | e6data on AWS
      • Connect to e6data serverless compute (AWS)
        • Workspace Creation
        • Catalog Creation
          • Glue Metastore
          • Hive Metastore
          • Unity Catalog
        • Cluster Creation
    • GCP Setup
      • In VPC Deployment (GCP)
        • Prerequisite Infrastructure
        • Infrastructure & Permissions for e6data
        • Setup Kubernetes Components
        • Setup using Terraform in GCP
        • Update a GCP Terraform for your Workspace
      • Connect to e6data serverless compute (GCP)
    • Azure Setup
      • Prerequisite Infrastructure
      • Infrastructure & Permissions for e6data
      • Setup Kubernetes Components
      • Setup using Terraform in AZURE
        • Update a AZURE Terraform for your Workspace
  • Workspaces
    • Create Workspaces
    • Enable/Disable Workspaces
    • Update a Workspace
    • Delete a Workspace
  • Catalogs
    • Create Catalogs
      • Hive Metastore
        • Connect to a Hive Metastore
        • Edit a Hive Metastore Connection
        • Delete a Hive Metastore Connection
      • Glue Metastore
        • Connect to a Glue Metastore
        • Edit a Glue Metastore Connection
        • Delete a Glue Metastore Connection
      • Unity Catalog
        • Connect to Unity Catalog
        • Edit Unity Catalog
        • Delete Unity Catalog
      • Cross-account Catalog Access
        • Configure Cross-account Catalog to Access AWS Hive Metastore
        • Configure Cross-account Catalog to Access Unity Catalog
        • Configure Cross-account Catalog to Access AWS Glue
        • Configure Cross-account Catalog to Access GCP Hive Metastore
    • Manage Catalogs
    • Privileges
      • Access Control
      • Column Masking
      • Row Filter
  • Clusters
    • Edit & Delete Clusters
    • Suspend & Resume Clusters
    • Cluster Size
    • Load Based Sizing
    • Auto Suspension
    • Query Timeout
    • Monitoring
    • Connection Info
  • Pools
    • Delete Pools
  • Query Editor
    • Editor Pane
    • Results Pane
    • Schema Explorer
    • Data Preview
  • Notebook
    • Editor Pane
    • Results Pane
    • Schema Explorer
    • Data Preview
  • Query History
    • Query Count API
  • Connectivity
    • IP Sets
    • Endpoints
    • Cloud Resources
    • Network Firewall
  • Access Control
    • Users
    • Groups
    • Roles
      • Permissions
      • Policies
    • Single Sign-On (SSO)
      • AWS SSO
      • Okta
      • Microsoft My Apps-SSO
      • Icons for IdP
    • Service Accounts
    • Multi-Factor Authentication (Beta)
  • Usage and Cost Management
  • Audit Log
  • User Settings
    • Profile
    • Personal Access Tokens (PAT)
  • Advanced Features
    • Cross-Catalog & Cross-Schema Querying
  • Supported Data Types
  • SQL Command Reference
    • Query Syntax
      • General functions
    • Aggregate Functions
    • Mathematical Functions & Operators
      • Arithematic Operators
      • Rounding and Truncation Functions
      • Exponential and Root Functions
      • Trigonometric Functions
      • Logarithmic Functions
    • String Functions
    • Date-Time Functions
      • Constant Functions
      • Conversion Functions
      • Date Truncate Function
      • Addition and Subtraction Functions
      • Extraction Functions
      • Format Functions
      • Timezone Functions
    • Conditional Expressions
    • Conversion Functions
    • Window Functions
    • Comparison Operators & Functions
    • Logical Operators
    • Statistical Functions
    • Bitwise Functions
    • Array Functions
    • Regular Expression Functions
    • Generate Functions
    • Cardinality Estimation Functions
    • JSON Functions
    • Checksum Functions
    • Unload Function (Copy into)
    • Struct Functions
  • Equivalent Functions & Operators
  • Connectors & Drivers
    • DBeaver
    • DbVisualiser
    • Apache Superset
    • Jupyter Notebook
    • Tableau Cloud
    • Tableau Desktop
    • Power BI
    • Metabase
    • Zeppelin
    • Python Connector
      • Code Samples
    • JDBC Driver
      • Code Samples
      • API Support
    • Configure Cluster Ingress
      • ALB Ingress in Kubernetes
      • GCE Ingress in Kubernetes
      • Ingress-Nginx in Kubernetes
  • Security & Trust
    • Best Practices
      • AWS Best Practices
    • Features & Responsibilities Matrix
    • Data Protection Addendum(DPA)
  • Tutorials and Best Practices
    • How to configure HIVE metastore if you don't have one?
    • How-To Videos
  • Known Limitations
    • SQL Limitations
    • Other Limitations
    • Restart Triggers
    • Cloud Provider Limitations
  • Error Codes
    • General Errors
    • User Account Errors
    • Workspace Errors
    • Catalog Errors
    • Cluster Errors
    • Data Governance Errors
    • Query History Errors
    • Query Editor Errors
    • Pool Errors
    • Connectivity Errors
  • Terms & Condition
  • Privacy Policy
    • Cookie Policy
  • FAQs
    • Workspace Setup
    • Security
    • Catalog Privileges
  • Services Utilised for e6data Deployment
    • AWS supported regions
    • GCP supported regions
    • AZURE supported regions
  • Release Notes & Updates
    • 6th Sept 2024
    • 6th June 2024
    • 18th April 2024
    • 9th April 2024
    • 30th March 2024
    • 16th March 2024
    • 14th March 2024
    • 12th March 2024
    • 2nd March 2024
    • 10th February 2024
    • 3rd February 2024
    • 17th January 2024
    • 9th January 2024
    • 3rd January 2024
    • 18th December 2023
    • 12th December 2023
    • 9th December 2023
    • 4th December 2023
    • 27th November 2023
    • 8th September 2023
    • 4th September 2023
    • 26th August 2023
    • 21st August 2023
    • 19th July 2023
    • 23rd May 2023
    • 5th May 2023
    • 28th April 2023
    • 19th April 2023
    • 15th April 2023
    • 10th April 2023
    • 30th March 2023
Powered by GitBook
On this page
  • Required Infrastructure
  • Required Permissions
  • Azure Storage Account and a blob Container
  • Required Permissions
  • Implementation
  • Create a Federated Identity Credential
  • Creating User-Assigned Managed Identity for e6data Engine's Blob Storage Access
  • Key Vault Access for the akv2k8s Tool
  • Step-by-Step Guide
  1. Setup
  2. Azure Setup

Infrastructure & Permissions for e6data

PreviousPrerequisite InfrastructureNextSetup Kubernetes Components

Last updated 7 months ago

Required Infrastructure

The following infrastructure required to run e6data must be created before setup:

  1. Azure Storage Account and a blob Container

Required Permissions

  1. A user-assigned identity with federated credentials will be configured with the necessary permissions to allow the e6data control plane to access the AKS cluster and manage other components, including .

  2. Managed identity for the e6data engine to access the query data.

Azure Storage Account and a blob Container

An Azure Storage Account with a Blob container is required to store data necessary for the operation of the e6data workspace, including service logs, query results, state information, and other operational data.

Steps to create these resources using Azure CLI:

Before proceeding with the commands, define the key variables that will be used throughout the process:

  1. Set up variables:

WORKSPACE_NAME="e6dataworkspace"
RANDOM_STRING=$(openssl rand -hex 4)
RESOURCE_GROUP_NAME="YourResourceGroup"
LOCATION="YourAzureRegion"
CONTAINER_NAME="e6data-workspace-data"
  1. Create the Azure Storage Account:

az storage account create \
    --name "${WORKSPACE_NAME}${RANDOM_STRING}" \
    --resource-group "${RESOURCE_GROUP_NAME}" \
    --location "${LOCATION}" \
    --sku Standard_LRS \
    --kind StorageV2
Command Breakdown
az storage account create: Initiates the creation of a new storage account.

--name "${WORKSPACE_NAME}${RANDOM_STRING}": Specifies the name of the storage account by combining the workspace name with a random string to ensure uniqueness.

--resource-group "${RESOURCE_GROUP_NAME}": Defines the resource group where the storage account will be created.

--location "${LOCATION}": Specifies the Azure region where the storage account will be located.

--sku Standard_LRS: Chooses the SKU (pricing tier) for the storage account, in this case, Standard Locally Redundant Storage (LRS).

--kind StorageV2: Indicates the type of storage account (StorageV2 supports both blob and file storage).
  1. Create the Blob container within the Storage Account:

az storage container create \
    --name "${CONTAINER_NAME}" \
    --account-name "${WORKSPACE_NAME}${RANDOM_STRING}" \
    --auth-mode login \
    --public-access off
Command Breakdown
az storage container create: Creates a new container within the specified storage account.

--name "${CONTAINER_NAME}": Sets the name of the container (e.g., e6data-workspace-data).

--account-name "${WORKSPACE_NAME}${RANDOM_STRING}": Specifies the name of the storage account created in the previous step.

--auth-mode login: Ensures that the command uses the currently authenticated Azure CLI session.

--public-access off: Disables public access to the container, enhancing security by restricting access.

After running these commands:

  • You will have a Storage Account named e6dataworkspace<random_string>.

  • Within this Storage Account, there will be a private Blob container named e6data-workspace-data.

  • This container can be used to store all the necessary data for your e6data workspace operations.

Required Permissions

Required Azure Permissions for e6data Control Plane

To enable proper functioning of the e6data control plane with Azure Kubernetes Service (AKS), the following permissions are required:

  1. AKS Cluster Access:

    • "Microsoft.ContainerService/managedClusters/listClusterUserCredential/action"

    • "Microsoft.ContainerService/managedClusters/read"

    These permissions allow secure access to the AKS cluster by obtaining the necessary kubeconfig file.

  2. Network Resource Access:

    • "Microsoft.Network/loadBalancers/read"

    • "Microsoft.Network/publicIPAddresses/read"

    • "Microsoft.Network/networkInterfaces/delete"

    • "Microsoft.Network/networkInterfaces/read"

Implementation

To acquire these permissions, we use a User Assigned Managed Identity with a federated identity credential. This setup allows for secure and controlled access to the necessary Azure resources for e6data operations by leveraging the managed identity’s client ID for authentication.

  1. Create an User-Assigned Managed Identity:

  • Set up variables:

TAGS="environment=dev"
IDENTITY_NAME="${WORKSPACE_NAME}-identity-${RANDOM_STRING}"
  • Create an identity:

IDENTITY_ID=$(az identity create \
  --name $IDENTITY_NAME \
  --resource-group "$RESOURCE_GROUP_NAME" \
  --location "$LOCATION" \
  --tags "$TAGS" \
  --query id -o tsv)
  1. IDENTITY_NAME="${WORKSPACE_NAME}-identity-${RANDOM_STRING}": Sets a variable for the identity name, using the workspace name and a random string for uniqueness.

  2. az identity create: Creates the user-assigned managed identity in Azure.

    • --name $IDENTITY_NAME: Specifies the name of the identity.

    • --resource-group "$RESOURCE_GROUP_NAME": The resource group where the identity will be created.

    • --location "$LOCATION": The Azure region where the identity will be created.

    • --tags "$TAGS": Adds tags for organizational purposes (optional).

    • --query id -o tsv: Retrieves the ID of the created identity in plain text format.

Create a Federated Identity Credential

To create a federated identity credential with the specified parameters, follow these steps:

FED_CRED_NAME="${WORKSPACE_NAME}-federated-credential-${RANDOM_STRING}"
ISSUER="https://cognito-identity.amazonaws.com"
AUDIENCE="<your-identity-pool-id>"  # Replace with your actual identity pool ID from the e6data console
SUBJECT="<your-identity-id>"  # Replace with your actual identity ID from the e6data console

# Create the federated identity credential
az identity federated-credential create \
  --name $FED_CRED_NAME \
  --identity-name $IDENTITY_NAME \
  --resource-group "$RESOURCE_GROUP_NAME" \
  --audiences "$AUDIENCE" \
  --issuer "$ISSUER" \
  --subject "$SUBJECT"
Command Breakdown
FED_CRED_NAME="${WORKSPACE_NAME}-federated-credential-${RANDOM_STRING}": Defines the name for the federated identity credential, combining the workspace name with a unique random string.

ISSUER="https://cognito-identity.amazonaws.com": Specifies the OIDC issuer URL for AWS Cognito.

AUDIENCE="<your-identity-pool-id>": The audience parameter should be set to your Identity Pool ID from the e6data console.

SUBJECT="<your-identity-id>": The subject parameter should be set to your Identity ID from the e6data console.

az identity federated-credential create: Initiates the creation of the federated identity credential.

--name $FED_CRED_NAME: Sets the name of the federated identity credential.

--identity-name $IDENTITY_NAME: Specifies the user-assigned managed identity to associate with the federated credential.

--resource-group "$RESOURCE_GROUP_NAME": Indicates the resource group containing the managed identity.

--audiences "$AUDIENCE": Sets the audience for the federated credential, matching the Identity Pool ID.

--issuer "$ISSUER": Provides the OIDC issuer URL for AWS Cognito.

--subject "$SUBJECT": Sets the subject to match the Identity ID from the e6data console.

This federated identity credential now links your managed identity to your Cognito configuration, enabling secure authentication without directly storing secrets.

Retrieve the Identity's Principal ID:

IDENTITY_PRINCIPAL_ID=$(az identity show --id $IDENTITY_ID --query principalId -o tsv)
  • az identity show: Fetches details of the created identity.

  • --id $IDENTITY_ID: Uses the identity ID from the previous command.

  • --query principalId -o tsv: Extracts the principal ID (the unique identifier for the identity) in plain text format.

Create custom role for AKS credentials:

AKS_CUSTOM_ROLE_NAME="e6data aks custom role ${WORKSPACE_NAME} ${RANDOM_STRING}"
RG_ID=$(az group show --name "$RESOURCE_GROUP_NAME" --query id -o tsv)

az role definition create --role-definition "{
    \"Name\": \"$AKS_CUSTOM_ROLE_NAME\",
    \"Description\": \"Custom role to list the aks cluster credential\",
    \"Actions\": [
        \"Microsoft.ContainerService/managedClusters/listClusterUserCredential/action\",
        \"Microsoft.ContainerService/managedClusters/read\"
    ],
    \"AssignableScopes\": [\"$RG_ID\"]
}"

Create custom role for load balancer and public IP:


LB_CUSTOM_ROLE_NAME="e6data aks custom role ${WORKSPACE_NAME} ${RANDOM_STRING}2"
AKS_MANAGED_RG_ID=$(az aks show --name "$AKS_CLUSTER_NAME" --resource-group "$RESOURCE_GROUP_NAME" --query nodeResourceGroup -o tsv)

az role definition create --role-definition "{
    \"Name\": \"$LB_CUSTOM_ROLE_NAME\",
    \"Description\": \"Custom role to read the lb and pip\",
    \"Actions\": [
        \"Microsoft.Network/loadBalancers/read\",
        \"Microsoft.Network/publicIPAddresses/read\",
        \"Microsoft.Network/networkInterfaces/delete\",
        \"Microsoft.Network/networkInterfaces/read\"
    ],
    \"AssignableScopes\": [\"$AKS_MANAGED_RG_ID\"]
}"

Create custom role for Key Vault access:


KEY_VAULT_CUSTOM_ROLE_NAME="e6data aks custom role customer key vault ${WORKSPACE_NAME} ${RANDOM_STRING}"
AKS_MANAGED_RG_ID=$(az aks show --name "$AKS_CLUSTER_NAME" --resource-group "$RESOURCE_GROUP_NAME" --query nodeResourceGroup -o tsv)

az role definition create --role-definition "{
    \"Name\": \"$KEY_VAULT_CUSTOM_ROLE_NAME\",
    \"Description\": \"Custom role to access the key vault\",
    \"DataActions\": [
        \"Microsoft.KeyVault/vaults/certificates/read\"
    ],
    \"AssignableScopes\": [\"$AKS_MANAGED_RG_ID\"]
}"

Assign custom AKS role to the managed identity:

AKS_CLUSTER_ID=$(az aks show --name "$AKS_CLUSTER_NAME" --resource-group "$RESOURCE_GROUP_NAME" --query id -o tsv)
AKS_CUSTOM_ROLE_ID=$(az role definition list --name "$AKS_CUSTOM_ROLE_NAME" --query [].id -o tsv)

az role assignment create --assignee $IDENTITY_PRINCIPAL_ID --role $AKS_CUSTOM_ROLE_ID --scope $AKS_CLUSTER_ID

Assign custom load balancer role to the managed identity:

LB_CUSTOM_ROLE_ID=$(az role definition list --name "$LB_CUSTOM_ROLE_NAME" --query [].id -o tsv)

az role assignment create --assignee $IDENTITY_PRINCIPAL_ID --role $LB_CUSTOM_ROLE_ID --scope $AKS_MANAGED_RG_ID

Remember to replace placeholders like $WORKSPACE_NAME, $RANDOM_STRING, $RESOURCE_GROUP_NAME, and $AKS_CLUSTER_NAME with your actual values.

These steps create an Azure AD application, managed identity, federated credential, custom roles, and necessary role assignments as described in your Terraform configuration. Note that some of these operations, especially those involving custom role creation and assignment, may require elevated permissions in your Azure AD and subscription.

Creating User-Assigned Managed Identity for e6data Engine's Blob Storage Access

This setup allows the e6data engine to securely access blob storage using Azure's managed identity, without the need for storing credentials within the pods. The Workload Identity feature in AKS facilitates the seamless use of the managed identity by the e6data engine, ensuring secure and efficient data access for querying purposes. The e6data engine requires specific access roles for different storage accounts:

  1. Log Storage Account:

    • Role: "Storage Blob Data Contributor"

    • Purpose: Allows e6data to write and manage logs in the dedicated storage account created to store data required for the operation of the e6data workspace.

  2. Data Storage Accounts:

    • Role: "Storage Blob Data Reader"

    • Purpose: Enables e6data to read data from the storage accounts containing the data buckets on which queries are executed. below are the steps to do it

  3. Create a user-assigned managed identity:

IDENTITY_NAME="${WORKSPACE_NAME}-identity-${RANDOM_STRING}"
IDENTITY_ID=$(az identity create \
  --name $IDENTITY_NAME \
  --resource-group "$RESOURCE_GROUP_NAME" \
  --location "$LOCATION" \
  --tags "$TAGS" \
  --query id -o tsv)

IDENTITY_PRINCIPAL_ID=$(az identity show --id $IDENTITY_ID --query principalId -o tsv)
  1. Create a federated identity credential:

FED_CRED_NAME="${WORKSPACE_NAME}-federated-credential-${RANDOM_STRING}"
AKS_OIDC_ISSUER=$(az aks show -n "$AKS_CLUSTER_NAME" -g "$RESOURCE_GROUP_NAME" --query "oidcIssuerProfile.issuerUrl" -o tsv)
KUBERNETES_NAMESPACE

az identity federated-credential create \
  --name $FED_CRED_NAME \
  --identity-name $IDENTITY_NAME \
  --resource-group "$RESOURCE_GROUP_NAME" \
  --audiences "api://AzureADTokenExchange" \
  --issuer "$AKS_OIDC_ISSUER" \
  --subject "system:serviceaccount:${KUBERNETES_NAMESPACE}:${WORKSPACE_NAME}"
  1. Assign Storage Blob Data Reader role to the managed identity for each container: Before running these commands, make sure to replace the following placeholders with your actual values:

    • DATA_STORAGE_ACCOUNT_NAME

    • DATA_STORAGE_ACCOUNT_RG

    • LIST_OF_CONTAINERS (this should be an array of container names)

DATA_STORAGE_ACCOUNT_ID=$(az storage account show --name "$DATA_STORAGE_ACCOUNT_NAME" --resource-group "$DATA_STORAGE_ACCOUNT_RG" --query id -o tsv)

for CONTAINER in "${LIST_OF_CONTAINERS[@]}"; do
  if [ "$CONTAINER" == "*" ]; then
    SCOPE="$DATA_STORAGE_ACCOUNT_ID"
  else
    SCOPE="$DATA_STORAGE_ACCOUNT_ID/blobServices/default/containers/$CONTAINER"
  fi
  
  az role assignment create \
    --role "Storage Blob Data Reader" \
    --assignee-object-id $IDENTITY_PRINCIPAL_ID \
    --assignee-principal-type ServicePrincipal \
    --scope $SCOPE
done
  1. Assign Storage Blob Data Contributor role to the managed identity for the e6data managed storage account:

E6DATA_STORAGE_ACCOUNT_ID=$(az storage account show --name "${WORKSPACE_NAME}${RANDOM_STRING}" --resource-group "$RESOURCE_GROUP_NAME" --query id -o tsv)

az role assignment create \
  --role "Storage Blob Data Contributor" \
  --assignee-object-id $IDENTITY_PRINCIPAL_ID \
  --assignee-principal-type ServicePrincipal \
  --scope $E6DATA_STORAGE_ACCOUNT_ID

These steps will create an user-assigned managed identity, a federated credential for AKS workload identity, and assign the necessary roles to access storage accounts and containers.

Key Vault Access for the akv2k8s Tool

To allow the akv2k8s tool to access the Azure Key Vault and retrieve the certificates necessary for TLS and Gateway connectivity, you need to assign the "Key Vault Certificate User" role to your AKS cluster’s kubelet identity. This ensures that the tool can securely fetch the required certificates.

Here are the steps to assign the "Key Vault Certificate User" role to your AKS cluster's kubelet identity using the Azure CLI:

Step-by-Step Guide

1. Log in to Azure:

az login

2. Set your subscription:

Replace <your-subscription-id> with your Azure subscription ID.

az account set --subscription <your-subscription-id>

3. Get the Key Vault ID:

Replace <keyvault-name> and <resource-group-name> with your Key Vault name and resource group name. This will retrieve the Key Vault's ID.

az keyvault show --name <keyvault-name> --resource-group <resource-group-name> --query id --output tsv

Save the output, as you will need it for the --scope parameter.

4. Get the kubelet identity's principal ID:

Replace <aks-cluster-name> and <resource-group-name> with your AKS cluster name and resource group name. This will retrieve the kubelet identity's principal ID.

az aks show --name <aks-cluster-name> --resource-group <resource-group-name> --query "identityProfile.kubeletidentity.clientId" --output tsv

Save the output, as you will need it for the --assignee parameter.

5. Assign the "Key Vault Certificate User" role:

Replace <keyvault-id> and <principal-id> with the values retrieved in the previous steps.

az role assignment create \
  --role "Key Vault Certificate User" \
  --scope <keyvault-id> \
  --assignee <principal-id>

These permissions grant read access to load balancers and public IP addresses, which is essential for creating e6data .

endpoints
endpoints