LogoLogo
  • Welcome to e6data
  • Introduction to e6data
    • Concepts
    • Architecture
      • e6data in VPC Deployment Model
      • Connect to e6data serverless compute
  • Get Started
  • Sign Up
  • Setup
    • AWS Setup
      • In VPC Deployment (AWS)
        • Prerequisite Infrastructure
        • Infrastructure & Permissions for e6data
        • Setup Kubernetes Components
        • Setup using Terraform in AWS
          • Update a AWS Terraform for your Workspace
        • AWS PrivateLink and e6data
        • VPC Peering | e6data on AWS
      • Connect to e6data serverless compute (AWS)
        • Workspace Creation
        • Catalog Creation
          • Glue Metastore
          • Hive Metastore
          • Unity Catalog
        • Cluster Creation
    • GCP Setup
      • In VPC Deployment (GCP)
        • Prerequisite Infrastructure
        • Infrastructure & Permissions for e6data
        • Setup Kubernetes Components
        • Setup using Terraform in GCP
        • Update a GCP Terraform for your Workspace
      • Connect to e6data serverless compute (GCP)
    • Azure Setup
      • Prerequisite Infrastructure
      • Infrastructure & Permissions for e6data
      • Setup Kubernetes Components
      • Setup using Terraform in AZURE
        • Update a AZURE Terraform for your Workspace
  • Workspaces
    • Create Workspaces
    • Enable/Disable Workspaces
    • Update a Workspace
    • Delete a Workspace
  • Catalogs
    • Create Catalogs
      • Hive Metastore
        • Connect to a Hive Metastore
        • Edit a Hive Metastore Connection
        • Delete a Hive Metastore Connection
      • Glue Metastore
        • Connect to a Glue Metastore
        • Edit a Glue Metastore Connection
        • Delete a Glue Metastore Connection
      • Unity Catalog
        • Connect to Unity Catalog
        • Edit Unity Catalog
        • Delete Unity Catalog
      • Cross-account Catalog Access
        • Configure Cross-account Catalog to Access AWS Hive Metastore
        • Configure Cross-account Catalog to Access Unity Catalog
        • Configure Cross-account Catalog to Access AWS Glue
        • Configure Cross-account Catalog to Access GCP Hive Metastore
    • Manage Catalogs
    • Privileges
      • Access Control
      • Column Masking
      • Row Filter
  • Clusters
    • Edit & Delete Clusters
    • Suspend & Resume Clusters
    • Cluster Size
    • Load Based Sizing
    • Auto Suspension
    • Query Timeout
    • Monitoring
    • Connection Info
  • Pools
    • Delete Pools
  • Query Editor
    • Editor Pane
    • Results Pane
    • Schema Explorer
    • Data Preview
  • Notebook
    • Editor Pane
    • Results Pane
    • Schema Explorer
    • Data Preview
  • Query History
    • Query Count API
  • Connectivity
    • IP Sets
    • Endpoints
    • Cloud Resources
    • Network Firewall
  • Access Control
    • Users
    • Groups
    • Roles
      • Permissions
      • Policies
    • Single Sign-On (SSO)
      • AWS SSO
      • Okta
      • Microsoft My Apps-SSO
      • Icons for IdP
    • Service Accounts
    • Multi-Factor Authentication (Beta)
  • Usage and Cost Management
  • Audit Log
  • User Settings
    • Profile
    • Personal Access Tokens (PAT)
  • Advanced Features
    • Cross-Catalog & Cross-Schema Querying
  • Supported Data Types
  • SQL Command Reference
    • Query Syntax
      • General functions
    • Aggregate Functions
    • Mathematical Functions & Operators
      • Arithematic Operators
      • Rounding and Truncation Functions
      • Exponential and Root Functions
      • Trigonometric Functions
      • Logarithmic Functions
    • String Functions
    • Date-Time Functions
      • Constant Functions
      • Conversion Functions
      • Date Truncate Function
      • Addition and Subtraction Functions
      • Extraction Functions
      • Format Functions
      • Timezone Functions
    • Conditional Expressions
    • Conversion Functions
    • Window Functions
    • Comparison Operators & Functions
    • Logical Operators
    • Statistical Functions
    • Bitwise Functions
    • Array Functions
    • Regular Expression Functions
    • Generate Functions
    • Cardinality Estimation Functions
    • JSON Functions
    • Checksum Functions
    • Unload Function (Copy into)
    • Struct Functions
  • Equivalent Functions & Operators
  • Connectors & Drivers
    • DBeaver
    • DbVisualiser
    • Apache Superset
    • Jupyter Notebook
    • Tableau Cloud
    • Tableau Desktop
    • Power BI
    • Metabase
    • Zeppelin
    • Python Connector
      • Code Samples
    • JDBC Driver
      • Code Samples
      • API Support
    • Configure Cluster Ingress
      • ALB Ingress in Kubernetes
      • GCE Ingress in Kubernetes
      • Ingress-Nginx in Kubernetes
  • Security & Trust
    • Best Practices
      • AWS Best Practices
    • Features & Responsibilities Matrix
    • Data Protection Addendum(DPA)
  • Tutorials and Best Practices
    • How to configure HIVE metastore if you don't have one?
    • How-To Videos
  • Known Limitations
    • SQL Limitations
    • Other Limitations
    • Restart Triggers
    • Cloud Provider Limitations
  • Error Codes
    • General Errors
    • User Account Errors
    • Workspace Errors
    • Catalog Errors
    • Cluster Errors
    • Data Governance Errors
    • Query History Errors
    • Query Editor Errors
    • Pool Errors
    • Connectivity Errors
  • Terms & Condition
  • Privacy Policy
    • Cookie Policy
  • FAQs
    • Workspace Setup
    • Security
    • Catalog Privileges
  • Services Utilised for e6data Deployment
    • AWS supported regions
    • GCP supported regions
    • AZURE supported regions
  • Release Notes & Updates
    • 6th Sept 2024
    • 6th June 2024
    • 18th April 2024
    • 9th April 2024
    • 30th March 2024
    • 16th March 2024
    • 14th March 2024
    • 12th March 2024
    • 2nd March 2024
    • 10th February 2024
    • 3rd February 2024
    • 17th January 2024
    • 9th January 2024
    • 3rd January 2024
    • 18th December 2023
    • 12th December 2023
    • 9th December 2023
    • 4th December 2023
    • 27th November 2023
    • 8th September 2023
    • 4th September 2023
    • 26th August 2023
    • 21st August 2023
    • 19th July 2023
    • 23rd May 2023
    • 5th May 2023
    • 28th April 2023
    • 19th April 2023
    • 15th April 2023
    • 10th April 2023
    • 30th March 2023
Powered by GitBook
On this page
  1. FAQs

Security

Frequently Asked Questions about Security

PreviousWorkspace SetupNextCatalog Privileges

Last updated 9 months ago

What access controls are in place and how can they be audited?
  • All Terraform scripts used to deploy e6data can be found in the e6data documentation:

  • The AWS Access Policies created by Terraform can be found here:

  • All permissions are created using the IAM OIDC method for the particular Kubernetes service account in the isolated e6data Kubernetes namespace in the EKS cluster.

Summary of Required Permissions

  • The e6data query engine cluster requires read-only access to the data to be queried (S3 buckets) and metastore/catalogs.

    • During workspace deployment, customers are asked to specify which S3 buckets should be accessible the to query engine (i.e bucket level and prefix level) depending on the use cases required.

    • These specific lines in the Terraform script define S3 access:

  • A new S3 bucket for e6data use is created with read-write access in the customer account. It is used to store the following:

    • Schema metadata (table names, column names, number of rows, file paths) for increased performance.

    • If the Query Editor is used, Sheets (SQL query text) are stored.

  • Read only access to AWS Glue is required for the engine to know the up-to-date paths to data for efficient query planning compute use.

How are e6data personnel restricted from accessing customers' data? How can this be audited?

Limited Access to e6data Control Plane

  • e6data employees do not have access to the e6data production system (e6data Control Plane).

  • e6data employees can only monitor & debug the Control Plane using Grafana.

  • e6data employees can only perform deployments via a CI/CD pipeline secured with multi-factor authentication.

No Access to Data Plane

  • e6data employees have no way of accessing the Data Plane in the customer’s environment.

  • All communication between the Control Plane & Data Plane takes place between applications.

  • Multiple levels of logging are in place to audit inter-planar communication:

    • Kubernetes logging

      • It is recommended to enable the Audit and Authenticator logging features in your EKS cluster which helps monitor the operations performed by the e6data Kubernetes user and any IAM authentications for Kubernetes entry points that take place.

    • S3 access logging

      • The S3 bucket created during deployment for the e6data query engine's usage will have access logs enabled by default.

      • This helps to track the access usage by e6data Control Plane applications on the mentioned S3 bucket.

    • Any API call that can cause the query engine to resume, suspend or scale needs to pass through the IAM layer to make an STS call on the Assume Role. This can be logged by the customer using Cloudtrail to detect any suspicious events.

    • EKS Authorized Networks

      • The IP range of e6data Control Plane can be allow listed for EKS access in addition to the IAM and Kubernetes RBAC access controls as an added layer of security

What data moves outside the customer's data plane?

What data is stored in the Control Plane?

The only data that moves from the Data Plane to the Control Plane and are stored by e6data are:

  • Usage, logs, and infrastructure metrics

    • Cluster infrastructure metrics

    • Number of pods and how long pods of each component (planners, queue managers, executors) are running

    • How many nodes are active

    • How long nodes are active for

    • CPU, memory, network, and I/O utilization of each pod

  • Cluster query metrics

    • Count of count successful, failed & running queries.

    • Masked query (SQL) text

      • Any part that may contain sensitive information is masked prior to movement to the control plane (filter by, group by, where clauses)

    • Hash of queries that are run.

    • The run duration of each query.

    • User who ran the query.

  • Schema information

    • Schemas (table and column names) for Schema Explorer in Query Editor

What data other data moves outside the Data Plane?

  • If the Query Editor is used, query results will be streamed to the user's browser via the Control Plane.

  • However, Query results are only visible in UI. Data remains in your data plane – e6data doesn’t upload, access, or download it.

  • If you use the Query Editor your query is created on the front end of e6data’s web UI in your browser, behind your firewalls and VPN, and is not accessed by any e6data employees.

  • When writing interactive queries using the Query Editor, for example: select * from users limit 1000 the data from your customers table will pass through the e6data infrastructure on the way to your browser. Data is not retained on e6data servers. Data does not live on e6data servers outside of your session. e6data employees are not able to access those sessions. Data is not written to disk.

Creating Workspaces in AWS
View the policies on GitHub
https://github.com/e6x-labs/terraform/blob/main/aws/e6data_engine_iam.tf#L19-L50
Guide to enabling Kubernetes logging
Guide to enabling S3 access logging
Guide to integrating Cloudtrail
Guide to using Authorized Networks for EKS
Workspace Creation Permissions