Infrastructure & Permissions for e6data

Required Infrastructure

The following infrastructure required to run e6data must be created before setup:

  1. Azure Storage Account and a blob Container

Required Permissions

  1. App registration with specified permissions for e6data control plane to access the AKS cluster and other components to manage endpoints(add link for e6data endpoints here).

  2. Managed identity for the e6data engine to access the query data.

Azure Storage Account and a blob Container

An Azure Storage Account with a Blob container is required to store data necessary for the operation of the e6data workspace, including service logs, query results, state information, and other operational data.

Steps to create these resources using Azure CLI:

  1. Set up variables:

WORKSPACE_NAME="e6dataworkspace"
RANDOM_STRING=$(openssl rand -hex 4)
RESOURCE_GROUP_NAME="YourResourceGroup"
LOCATION="YourAzureRegion"
CONTAINER_NAME="e6data-workspace-data"
  1. Create the Azure Storage Account:

az storage account create \
    --name "${WORKSPACE_NAME}${RANDOM_STRING}" \
    --resource-group "${RESOURCE_GROUP_NAME}" \
    --location "${LOCATION}" \
    --sku Standard_LRS \
    --kind StorageV2
  1. Create the Blob container within the Storage Account:

az storage container create \
    --name "${CONTAINER_NAME}" \
    --account-name "${WORKSPACE_NAME}${RANDOM_STRING}" \
    --auth-mode login \
    --public-access off

After running these commands:

  • You will have a Storage Account named e6dataworkspace<random_string>.

  • Within this Storage Account, there will be a private Blob container named e6data-workspace-data.

  • This container can be used to store all the necessary data for your e6data workspace operations.

Required Permissions

Required Azure Permissions for e6data Control Plane

To enable proper functioning of the e6data control plane with Azure Kubernetes Service (AKS), the following permissions are required:

  1. AKS Cluster Access:

    • "Microsoft.ContainerService/managedClusters/listClusterUserCredential/action"

    • "Microsoft.ContainerService/managedClusters/read"

    These permissions allow secure access to the AKS cluster by obtaining the necessary kubeconfig file.

  2. Network Resource Access:

    • "Microsoft.Network/loadBalancers/read"

    • "Microsoft.Network/publicIPAddresses/read"

    These permissions grant read access to load balancers and public IP addresses, which is essential for creating e6data endpoints.

Implementation: To acquire these permissions, create an Azure App Registration. Use the App Registration's client ID and secret to authenticate and gain the required access. This approach ensures secure and controlled access to the necessary Azure resources for e6data operations.

  1. Create an Azure AD application:

APP_NAME="${WORKSPACE_NAME}-app-${RANDOM_STRING}"
APP_ID=$(az ad app create --display-name "$APP_NAME" --sign-in-audience AzureADMultipleOrgs --query appId -o tsv)
  1. Create an Azure AD application password:

SECRET=$(az ad app credential reset --id $APP_ID --years 2 --query password -o tsv)
  1. Create an Azure AD service principal:

SP_ID=$(az ad sp create --id $APP_ID --query id -o tsv)
  1. Assign Storage Blob Data Contributor role to the service principal:

STORAGE_ACCOUNT_ID=$(az storage account show --name "${WORKSPACE_NAME}${RANDOM_STRING}" --resource-group "$RESOURCE_GROUP_NAME" --query id -o tsv)
az role assignment create --assignee $SP_ID --role "Storage Blob Data Contributor" --scope $STORAGE_ACCOUNT_ID
  1. Create a custom role for AKS credentials:

AKS_CUSTOM_ROLE_NAME="e6data aks custom role ${WORKSPACE_NAME} ${RANDOM_STRING}"
RG_ID=$(az group show --name "$RESOURCE_GROUP_NAME" --query id -o tsv)

az role definition create --role-definition "{
    \"Name\": \"$AKS_CUSTOM_ROLE_NAME\",
    \"Description\": \"Custom role to list the aks cluster credential\",
    \"Actions\": [
        \"Microsoft.ContainerService/managedClusters/listClusterUserCredential/action\",
        \"Microsoft.ContainerService/managedClusters/read\"
    ],
    \"AssignableScopes\": [\"$RG_ID\"]
}"
  1. Create custom role for load balancer and public IP:

LB_CUSTOM_ROLE_NAME="e6data aks custom role ${WORKSPACE_NAME} ${RANDOM_STRING}2"
AKS_MANAGED_RG_ID=$(az aks show --name "$AKS_CLUSTER_NAME" --resource-group "$RESOURCE_GROUP_NAME" --query nodeResourceGroup -o tsv)

az role definition create --role-definition "{
    \"Name\": \"$LB_CUSTOM_ROLE_NAME\",
    \"Description\": \"Custom role to read the lb and pip\",
    \"Actions\": [
        \"Microsoft.Network/loadBalancers/read\",
        \"Microsoft.Network/publicIPAddresses/read\"
    ],
    \"AssignableScopes\": [\"$AKS_MANAGED_RG_ID\"]
}"
  1. Assign custom AKS role to the service principal:

AKS_CLUSTER_ID=$(az aks show --name "$AKS_CLUSTER_NAME" --resource-group "$RESOURCE_GROUP_NAME" --query id -o tsv)
AKS_CUSTOM_ROLE_ID=$(az role definition list --name "$AKS_CUSTOM_ROLE_NAME" --query [].id -o tsv)

az role assignment create --assignee $SP_ID --role $AKS_CUSTOM_ROLE_ID --scope $AKS_CLUSTER_ID
  1. Assign custom load balancer role to the service principal:

LB_CUSTOM_ROLE_ID=$(az role definition list --name "$LB_CUSTOM_ROLE_NAME" --query [].id -o tsv)

az role assignment create --assignee $SP_ID --role $LB_CUSTOM_ROLE_ID --scope $AKS_MANAGED_RG_ID

Remember to replace placeholders like $WORKSPACE_NAME, $RANDOM_STRING, $RESOURCE_GROUP_NAME, and $AKS_CLUSTER_NAME with your actual values.

These steps create an Azure AD application, service principal, custom roles, and necessary role assignments as described in your Terraform configuration. Note that some of these operations, especially those involving custom role creation and assignment, may require elevated permissions in your Azure AD and subscription.

Creating User-Assigned Managed Identity for e6data Engine's Blob Storage Access

This setup allows the e6data engine to securely access blob storage using Azure's managed identity, without the need for storing credentials within the pods. The Workload Identity feature in AKS facilitates the seamless use of the managed identity by the e6data engine, ensuring secure and efficient data access for querying purposes. The e6data engine requires specific access roles for different storage accounts:

  1. Log Storage Account:

    • Role: "Storage Blob Data Contributor"

    • Purpose: Allows e6data to write and manage logs in the dedicated storage account created to store data required to operate the e6data workspace.

  2. Data Storage Accounts:

    • Role: "Storage Blob Data Reader"

    • Purpose: Enables e6data to read data from the storage accounts containing the data buckets on which queries are executed.

Please follow the below steps on how to do it:

  1. Create a user-assigned managed identity:

IDENTITY_NAME="${WORKSPACE_NAME}-identity-${RANDOM_STRING}"
IDENTITY_ID=$(az identity create \
  --name $IDENTITY_NAME \
  --resource-group "$RESOURCE_GROUP_NAME" \
  --location "$LOCATION" \
  --tags "$TAGS" \
  --query id -o tsv)

IDENTITY_PRINCIPAL_ID=$(az identity show --id $IDENTITY_ID --query principalId -o tsv)
  1. Create a federated identity credential:

FED_CRED_NAME="${WORKSPACE_NAME}-federated-credential-${RANDOM_STRING}"
AKS_OIDC_ISSUER=$(az aks show -n "$AKS_CLUSTER_NAME" -g "$RESOURCE_GROUP_NAME" --query "oidcIssuerProfile.issuerUrl" -o tsv)

az identity federated-credential create \
  --name $FED_CRED_NAME \
  --identity-name $IDENTITY_NAME \
  --resource-group "$RESOURCE_GROUP_NAME" \
  --audiences "api://AzureADTokenExchange" \
  --issuer "$AKS_OIDC_ISSUER" \
  --subject "system:serviceaccount:${KUBERNETES_NAMESPACE}:${WORKSPACE_NAME}"
  1. Assign Storage Blob Data Reader role to the managed identity for each container:

DATA_STORAGE_ACCOUNT_ID=$(az storage account show --name "$DATA_STORAGE_ACCOUNT_NAME" --resource-group "$DATA_STORAGE_ACCOUNT_RG" --query id -o tsv)

for CONTAINER in "${LIST_OF_CONTAINERS[@]}"; do
  if [ "$CONTAINER" == "*" ]; then
    SCOPE="$DATA_STORAGE_ACCOUNT_ID"
  else
    SCOPE="$DATA_STORAGE_ACCOUNT_ID/blobServices/default/containers/$CONTAINER"
  fi
  
  az role assignment create \
    --role "Storage Blob Data Reader" \
    --assignee-object-id $IDENTITY_PRINCIPAL_ID \
    --assignee-principal-type ServicePrincipal \
    --scope $SCOPE
done
  1. Assign Storage Blob Data Contributor role to the managed identity for the e6data managed storage account:

E6DATA_STORAGE_ACCOUNT_ID=$(az storage account show --name "${WORKSPACE_NAME}${RANDOM_STRING}" --resource-group "$RESOURCE_GROUP_NAME" --query id -o tsv)

az role assignment create \
  --role "Storage Blob Data Contributor" \
  --assignee-object-id $IDENTITY_PRINCIPAL_ID \
  --assignee-principal-type ServicePrincipal \
  --scope $E6DATA_STORAGE_ACCOUNT_ID

Before running these commands, make sure to replace the following placeholders with your actual values:

  • WORKSPACE_NAME

  • RANDOM_STRING

  • RESOURCE_GROUP_NAME

  • LOCATION

  • TAGS

  • AKS_CLUSTER_NAME

  • KUBERNETES_NAMESPACE

  • DATA_STORAGE_ACCOUNT_NAME

  • DATA_STORAGE_ACCOUNT_RG

  • LIST_OF_CONTAINERS (this should be an array of container names)

These steps will create an user-assigned managed identity, a federated credential for AKS workload identity, and assign the necessary roles to access storage accounts and containers.

KEY VAULT ACCESS FOR THE akv2k8s tool

This is required for the akv2k8s tool to access the Key Vault and retrieve the certificate necessary for TLS and Gateway connectivity.

Here are the steps to assign the "Key Vault Certificate User" role to your AKS cluster's kubelet identity using the Azure CLI:

Step-by-Step Guide

  1. Log in to Azure:

az login
  1. Set your subscription:

Replace <your-subscription-id> with your Azure subscription ID.

az account set --subscription <your-subscription-id>
  1. Get the Key Vault ID:

Replace <keyvault-name> and <resource-group-name> with your Key Vault name and resource group name. This will retrieve the Key Vault's ID.

az keyvault show --name <keyvault-name> --resource-group <resource-group-name> --query id --output tsv

Save the output, as you will need it for the --scope parameter.

  1. Get the kubelet identity's principal ID:

Replace <aks-cluster-name> and <resource-group-name> with your AKS cluster name and resource group name. This will retrieve the kubelet identity's principal ID.

az aks show --name <aks-cluster-name> --resource-group <resource-group-name> --query "identityProfile.kubeletidentity.clientId" --output tsv

Save the output, as you will need it for the --assignee parameter.

  1. Assign the "Key Vault Certificate User" role:

Replace <keyvault-id> and <principal-id> with the values retrieved in the previous steps.

az role assignment create \
  --role "Key Vault Certificate User" \
  --scope <keyvault-id> \
  --assignee <principal-id>

Last updated

Change request #930: Cross account hive GCP