LogoLogo
  • Welcome to e6data
  • Introduction to e6data
    • Concepts
    • Architecture
      • e6data in VPC Deployment Model
      • Connect to e6data serverless compute
  • Get Started
  • Sign Up
  • Setup
    • AWS Setup
      • In VPC Deployment (AWS)
        • Prerequisite Infrastructure
        • Infrastructure & Permissions for e6data
        • Setup Kubernetes Components
        • Setup using Terraform in AWS
          • Update a AWS Terraform for your Workspace
        • AWS PrivateLink and e6data
        • VPC Peering | e6data on AWS
      • Connect to e6data serverless compute (AWS)
        • Workspace Creation
        • Catalog Creation
          • Glue Metastore
          • Hive Metastore
          • Unity Catalog
        • Cluster Creation
    • GCP Setup
      • In VPC Deployment (GCP)
        • Prerequisite Infrastructure
        • Infrastructure & Permissions for e6data
        • Setup Kubernetes Components
        • Setup using Terraform in GCP
        • Update a GCP Terraform for your Workspace
      • Connect to e6data serverless compute (GCP)
    • Azure Setup
      • Prerequisite Infrastructure
      • Infrastructure & Permissions for e6data
      • Setup Kubernetes Components
      • Setup using Terraform in AZURE
        • Update a AZURE Terraform for your Workspace
  • Workspaces
    • Create Workspaces
    • Enable/Disable Workspaces
    • Update a Workspace
    • Delete a Workspace
  • Catalogs
    • Create Catalogs
      • Hive Metastore
        • Connect to a Hive Metastore
        • Edit a Hive Metastore Connection
        • Delete a Hive Metastore Connection
      • Glue Metastore
        • Connect to a Glue Metastore
        • Edit a Glue Metastore Connection
        • Delete a Glue Metastore Connection
      • Unity Catalog
        • Connect to Unity Catalog
        • Edit Unity Catalog
        • Delete Unity Catalog
      • Cross-account Catalog Access
        • Configure Cross-account Catalog to Access AWS Hive Metastore
        • Configure Cross-account Catalog to Access Unity Catalog
        • Configure Cross-account Catalog to Access AWS Glue
        • Configure Cross-account Catalog to Access GCP Hive Metastore
    • Manage Catalogs
    • Privileges
      • Access Control
      • Column Masking
      • Row Filter
  • Clusters
    • Edit & Delete Clusters
    • Suspend & Resume Clusters
    • Cluster Size
    • Load Based Sizing
    • Auto Suspension
    • Query Timeout
    • Monitoring
    • Connection Info
  • Pools
    • Delete Pools
  • Query Editor
    • Editor Pane
    • Results Pane
    • Schema Explorer
    • Data Preview
  • Notebook
    • Editor Pane
    • Results Pane
    • Schema Explorer
    • Data Preview
  • Query History
    • Query Count API
  • Connectivity
    • IP Sets
    • Endpoints
    • Cloud Resources
    • Network Firewall
  • Access Control
    • Users
    • Groups
    • Roles
      • Permissions
      • Policies
    • Single Sign-On (SSO)
      • AWS SSO
      • Okta
      • Microsoft My Apps-SSO
      • Icons for IdP
    • Service Accounts
    • Multi-Factor Authentication (Beta)
  • Usage and Cost Management
  • Audit Log
  • User Settings
    • Profile
    • Personal Access Tokens (PAT)
  • Advanced Features
    • Cross-Catalog & Cross-Schema Querying
  • Supported Data Types
  • SQL Command Reference
    • Query Syntax
      • General functions
    • Aggregate Functions
    • Mathematical Functions & Operators
      • Arithematic Operators
      • Rounding and Truncation Functions
      • Exponential and Root Functions
      • Trigonometric Functions
      • Logarithmic Functions
    • String Functions
    • Date-Time Functions
      • Constant Functions
      • Conversion Functions
      • Date Truncate Function
      • Addition and Subtraction Functions
      • Extraction Functions
      • Format Functions
      • Timezone Functions
    • Conditional Expressions
    • Conversion Functions
    • Window Functions
    • Comparison Operators & Functions
    • Logical Operators
    • Statistical Functions
    • Bitwise Functions
    • Array Functions
    • Regular Expression Functions
    • Generate Functions
    • Cardinality Estimation Functions
    • JSON Functions
    • Checksum Functions
    • Unload Function (Copy into)
    • Struct Functions
  • Equivalent Functions & Operators
  • Connectors & Drivers
    • DBeaver
    • DbVisualiser
    • Apache Superset
    • Jupyter Notebook
    • Tableau Cloud
    • Tableau Desktop
    • Power BI
    • Metabase
    • Zeppelin
    • Python Connector
      • Code Samples
    • JDBC Driver
      • Code Samples
      • API Support
    • Configure Cluster Ingress
      • ALB Ingress in Kubernetes
      • GCE Ingress in Kubernetes
      • Ingress-Nginx in Kubernetes
  • Security & Trust
    • Best Practices
      • AWS Best Practices
    • Features & Responsibilities Matrix
    • Data Protection Addendum(DPA)
  • Tutorials and Best Practices
    • How to configure HIVE metastore if you don't have one?
    • How-To Videos
  • Known Limitations
    • SQL Limitations
    • Other Limitations
    • Restart Triggers
    • Cloud Provider Limitations
  • Error Codes
    • General Errors
    • User Account Errors
    • Workspace Errors
    • Catalog Errors
    • Cluster Errors
    • Data Governance Errors
    • Query History Errors
    • Query Editor Errors
    • Pool Errors
    • Connectivity Errors
  • Terms & Condition
  • Privacy Policy
    • Cookie Policy
  • FAQs
    • Workspace Setup
    • Security
    • Catalog Privileges
  • Services Utilised for e6data Deployment
    • AWS supported regions
    • GCP supported regions
    • AZURE supported regions
  • Release Notes & Updates
    • 6th Sept 2024
    • 6th June 2024
    • 18th April 2024
    • 9th April 2024
    • 30th March 2024
    • 16th March 2024
    • 14th March 2024
    • 12th March 2024
    • 2nd March 2024
    • 10th February 2024
    • 3rd February 2024
    • 17th January 2024
    • 9th January 2024
    • 3rd January 2024
    • 18th December 2023
    • 12th December 2023
    • 9th December 2023
    • 4th December 2023
    • 27th November 2023
    • 8th September 2023
    • 4th September 2023
    • 26th August 2023
    • 21st August 2023
    • 19th July 2023
    • 23rd May 2023
    • 5th May 2023
    • 28th April 2023
    • 19th April 2023
    • 15th April 2023
    • 10th April 2023
    • 30th March 2023
Powered by GitBook
On this page
  • Best Practices
  • Environment Variables
  • Create Connection
  • Switch Database in an Existing Connection
  • Perform Query & Get Results
  • Code Hygiene
  • Get Row Count
  • Get Query Execution Plan
  • Abort a Query
  • Get Query Time Metrics
  • Get Query ID
  • Combine Multiple Functions
  • Auto Resume
  1. Connectors & Drivers
  2. Python Connector

Code Samples

Python code snippets to carry out common operations on e6data

Best Practices

Environment Variables

Setting environment variables is encouraged over hardcoding authentication variables and cluster IPs. The following environment variables can be set:

  • HOST_NAME: IP address or hostname of the cluster to be used.

  • E6_USER: Email address of the user account running the queries.

  • E6_TOKEN: Access token generated by the user in the e6data Console.

HOST_NAME=os.getenv("HOST_NAME")
E6_USER=os.getenv("E6_USER")
E6_TOKEN = os.getenv("E6_TOKEN")

Usage:

from e6data_python_connector import Connection

Create Connection

Sample code for establishing a connection with an e6data cluster.

from e6data_python_connector import Connection

username = '<username>'  # Your e6data Email ID.
password = '<password>'  # Access Token generated in the e6data console.

host = '<host>'  # IP address or hostname of the cluster to be used.
database = '<database>'  # # Database to perform the query on.
port = 443  # Port of the e6data engine.
catalog_name = '<catalog_name>'
cluster_uuid = '<cluster_uuid>'  # Specify cluster UUID on when secure=True
secure = True # Flag to use a secure channel for data transfer, default to False

conn = Connection(
    host=host,
    port=port,
    username=username,
    database=database,
    password=password,
    cluster_uuid=cluster_uuid,
    secure=secure
)

Switch Database in an Existing Connection

database = '<new_database_name>'  # Replace with the new database name
cursor = conn.cursor(database)

Perform Query & Get Results

This code executes a query and fetches all the results.

The cursor supports fetchmany and fetchone methods.

query = 'SELECT * FROM <TABLE_NAME>'  # Replace with the query.

cursor = conn.cursor(catalog_name=catalog_name)
query_id = cursor.execute(query)  # The execute function returns a unique query ID, which can be use to abort the query.
all_records = cursor.fetchall()
for row in all_records:
   print(row)

To get all the results:

records = cursor.fetchall()

To get only one result (first row):

record = cursor.fetchone()

To limit the number of results:

limit = 500
records = cursor.fetchmany(limit)

Code Hygiene

It is recommended to clear the cursor, close the cursor and close the connection after running a function as a best practice. This enhances performance by clearing old data from memory.

cursor.clear() # Not needed when aborting a query
cursor.close()
connection.close()

Get Row Count

This code executes a query and returns the number of rows in the result set.

Fetching rowcount from queryplanner is preferred over using fetchall method.

def get_result_row_count(query,database,catalog_name):
   connection = Connection(host= HOST_NAME,
               port = 80,
               username = E6_USER,
               database = database,
               password = E6_TOKEN)


   cursor = connection.cursor(catalog_name=catalog_name, db_name=database)
   query_id = cursor.execute(query)
   records = cursor.fetchmany(10000)
   query_planner = json.loads(cursor.explain_analyse())
   row_count = cursor.rowcount
   cursor.clear()
   cursor.close()
   connection.close()
   return row_count

if __name__=="__main__":
   get_result_row_count(<QUERY>,<DATABASE>,<CATALOG_NAME>)

Get Query Execution Plan

The following code runs a query and returns the execution plan generated for the query by the e6data engine.

The execution plan data will be returned as a JSON string and should be parsed as JSON data.

def get_query_planner(query,database,catalog_name):
   connection = Connection(host= HOST_NAME,
               port = 80,
               username = E6_USER,
               database = database,
               password = E6_TOKEN)


   cursor = connection.cursor(catalog_name=catalog_name, db_name=database)
   query_id = cursor.execute(query)
   records = cursor.fetchmany(10000)
   query_planner=json.loads(cursor.explain_analyse())
   cursor.clear()
   cursor.close()
   connection.close()
   return query_planner

if __name__=="__main__":
   get_query_planner(<QUERY>,<DATABASE>,<CATALOG_NAME>)

Abort a Query

def cancel_query(query_id,database,catalog_name):
   connection = Connection(host= HOST_NAME,
               port = 80,
               username = E6_USER,
               database = database,
               password = E6_TOKEN)


   cursor = connection.cursor(catalog_name=catalog_name, db_name=database)
   cursor.cancel(query_id)
   cursor.close()
   connection.close()
   print("Query Cancelled")

if __name__=="__main__":
   get_query_planner(<QUERY_ID>,<DATABASE>,<CATALOG_NAME>)

Get Query Time Metrics

The following code runs a query and returns:

  • amount of time taken to execute the query in seconds

  • amount of time the query spent in the queue awaiting execution in seconds

  • amount of time taken to parse the query in seconds

def get_e6_time_outputs(query,database,catalog_name):
   connection = Connection(host= HOST_NAME,
               port = 80,
               username = E6_USER,
               database = database,
               password = E6_TOKEN)


cursor = conn.cursor(catalog_name)
query_id = cursor.execute(query)  # execute function returns query id, can be use for aborting the query.
all_records = cursor.fetchall()
explain_response = cursor.explain_analyse()
query_planner = json.loads(explain_response.get('planner'))

execution_time = query_planner.get("total_query_time")  # In milliseconds
queue_time = query_planner.get("executionQueueingTime")  # In milliseconds
parsing_time = query_planner.get("parsingTime")  # In milliseconds
row_count = cursor.rowcount

if __name__=="__main__":
   get_e6_time_outputs(<QUERY>,<DATABASE>,<CATALOG_NAME>) 

Get Query ID

This code executes a query and returns the query ID.

def get_query_id(query_id,database,catalog_name):
   connection = Connection(host= HOST_NAME,
               port = 80,
               username = E6_USER,
               database = database,
               password = E6_TOKEN)


   cursor = connection.cursor(catalog_name=catalog_name, db_name=database)
   query_id = cursor.execute(query)
   all_records = cursor.fetchall()
   cursor.clear()
   cursor.close()
   connection.close()
   return query_id
   
if __name__=="__main__":
   e6x_query(<QUERY>,<DATABASE>)

Combine Multiple Functions

The following code is an example using multiple functions defined above to return:

  • Number of rows

  • The time taken to execute the query

  • Query results

from e6data_python_connector import Connection
import json

username = '<username>'  # Your e6data Email ID.
password = '<password>'  # Access Token generated in the e6data console.

host = '<host>'  # IP address or hostname of the cluster to be used.
database = '<database>'  # # Database to perform the query on.
port = 80  # Port of the e6data engine.

sql_query = 'SELECT * FROM <TABLE_NAME>'  # Replace with the actual query.

catalog_name = '<catalog_name>'  # Replace with the actual catalog name.

conn = Connection(
    host=host,
    port=port,
    username=username,
    database=database,
    password=password
)

cursor = conn.cursor(db_name=database, catalog_name=catalog_name)
query_id = cursor.execute(sql_query)
all_records = cursor.fetchall()
explain_response = cursor.explain_analyse()
planner_result = json.loads(explain_response.get('planner'))
execution_time = planner_result.get("total_query_time") / 1000  # Converting into seconds.
row_count = cursor.rowcount
columns = [col[0] for col in cursor.description]  # Get the column names and merge them with the results.
results = []
for row in all_records:
   row = dict(zip(columns, row))
   results.append(row)
   print(row)
print('Total row count {}, Execution Time (seconds): {}'.format(row_count, execution_time))
cursor.clear()
cursor.close()
conn.close()

Auto Resume

To enable Auto Resume while establishing a connection using the Python Connector, set auto_resume=True in the connection configuration. This ensures that the cluster resumes automatically when a query is executed, eliminating the need for manual intervention.

Note: Auto Resume must also be enabled at the cluster level to take effect.

conn = Connection(
    host=host,
    port=port,
    username=username,
    database=database,
    password=password,
    auto_resume=True
)

PreviousPython ConnectorNextJDBC Driver

Last updated 1 month ago

The following code aborts a query, referenced by the input Query ID. Refer for information on obtaining the Query ID.

Get Query ID