Code Samples

Python code snippets to carry out common operations on e6data

Best Practices

Environment Variables

Setting environment variables is encouraged over hardcoding authentication credentials or cluster details. The following environment variables can be configured before initializing the connection:

  • HOST_NAME — Hostname or IP address of the e6data cluster.

  • E6_USER — Username (email address) associated with your e6data account.

  • E6_TOKEN — Access token generated from the e6data Console under your user profile.

These environment variables are fetched dynamically using os.getenv() during connection setup.

import os
from e6data_python_connector import Connection

HOST_NAME = os.getenv("HOST_NAME")
E6_USER = os.getenv("E6_USER")
E6_TOKEN = os.getenv("E6_TOKEN")

Usage:

from e6data_python_connector import Connection

Create Connection

Sample code for establishing a connection with an e6data cluster.

host = HOST_NAME or "<host>"
username = E6_USER or "<username>"
password = E6_TOKEN or "<access_token>"
database = "<database>"
catalog_name = "<catalog_name>"

conn = Connection(
    host=host,
    port=443,
    username=username,
    password=password,
    database=database,
    cluster_name='production-cluster',  # or your cluster
    secure=True,
    auto_resume=True  # enable auto-resume if supported
)

Switch Database in an Existing Connection

database = '<new_database_name>'  # Replace with the new database
cursor = conn.cursor(catalog_name=catalog_name, database=database)

Perform Query & Get Results

Setting up the cursor allows you to execute queries and fetch results. The cursor supports:

  • fetchall() — fetch all results.

  • fetchmany(limit) — fetch a limited number of rows.

  • fetchone() — fetch a single row.

  • fetchall_buffer() — memory-efficient streaming for large result sets.

cursor = conn.cursor(catalog_name=catalog_name)

query = "SELECT * FROM <TABLE_NAME>"
query_id = cursor.execute(query)

# Fetch all results
all_records = cursor.fetchall()
for row in all_records:
    print(row)

# OR fetch limited number of rows
records = cursor.fetchmany(500)

# OR memory-efficient streaming for large result sets
for item in cursor.fetchall_buffer():
    print(item)

To get all the results:

records = cursor.fetchall()

To get only one result (first row):

record = cursor.fetchone()

To limit the number of results:

limit = 500
records = cursor.fetchmany(limit)

To fetch all results in a memory-efficient way

for item in cursor.fetchall_buffer():
    print(item)

Code Hygiene

It is recommended to clear the cursor, close the cursor and close the connection after running a function as a best practice. This enhances performance by clearing old data from memory.

cursor.clear() # Not needed when aborting a query
cursor.close()
connection.close()

Get Row Count

This code executes a query and returns the number of rows in the result set. Fetching rowcount directly from the cursor is preferred over using fetchall() for performance and memory efficiency.

def get_result_row_count(query, database, catalog_name):
    conn = Connection(
        host=HOST_NAME or "<host>",
        port=443,
        username=E6_USER or "<username>",
        password=E6_TOKEN or "<access_token>",
        database=database,
        catalog_name=catalog_name,
        cluster_name='production-cluster',  # or your cluster
        secure=True,
        auto_resume=True
    )

    cursor = conn.cursor(catalog_name=catalog_name, database=database)
    query_id = cursor.execute(query)
    # Fetch a limited number of records if needed
    records = cursor.fetchmany(10000)
    row_count = cursor.rowcount

    cursor.clear()
    cursor.close()
    conn.close()
    return row_count

if __name__ == "__main__":
    get_result_row_count(<QUERY>, <DATABASE>, <CATALOG_NAME>)

Get Query Execution Plan

The following code runs a query and returns the execution plan generated for the query by the e6data engine.

The execution plan data will be returned as a JSON string and should be parsed as JSON data.

def get_query_planner(query,database,catalog_name):
   connection = Connection(host= HOST_NAME,
               port = 80,
               username = E6_USER,
               database = database,
               password = E6_TOKEN)


   cursor = connection.cursor(catalog_name=catalog_name, db_name=database)
   query_id = cursor.execute(query)
   records = cursor.fetchmany(10000)
   query_planner=json.loads(cursor.explain_analyse())
   cursor.clear()
   cursor.close()
   connection.close()
   return query_planner

if __name__=="__main__":
   get_query_planner(<QUERY>,<DATABASE>,<CATALOG_NAME>)

Abort a Query

The following code aborts a query, referenced by the input Query ID. Refer Get Query ID for information on obtaining the Query ID.

import json

def get_query_planner(query, database, catalog_name):
    conn = Connection(
        host=HOST_NAME or "<host>",
        port=443,
        username=E6_USER or "<username>",
        password=E6_TOKEN or "<access_token>",
        database=database,
        catalog_name=catalog_name,
        cluster_name='production-cluster',  # or your cluster
        secure=True,
        auto_resume=True
    )

    cursor = conn.cursor(catalog_name=catalog_name, database=database)
    query_id = cursor.execute(query)
    # Fetch a limited number of records if needed
    records = cursor.fetchmany(10000)
    explain_response = cursor.explain_analyse()
    query_planner = json.loads(explain_response.get('planner'))

    cursor.clear()
    cursor.close()
    conn.close()
    return query_planner

if __name__ == "__main__":
    get_query_planner(<QUERY>, <DATABASE>, <CATALOG_NAME>)

Get Query Time Metrics

The following code runs a query and returns:

  • amount of time taken to execute the query in seconds

  • amount of time the query spent in the queue awaiting execution in seconds

  • amount of time taken to parse the query in seconds

execution_time = query_planner.get("total_query_time") / 1000  # convert to seconds
queue_time = query_planner.get("executionQueueingTime") / 1000
parsing_time = query_planner.get("parsingTime") / 1000
row_count = cursor.rowcount

Get Query ID

This code executes a query and returns the query ID.

from e6data_python_connector import Connection
import os

HOST_NAME = os.getenv("HOST_NAME")
E6_USER = os.getenv("E6_USER")
E6_TOKEN = os.getenv("E6_TOKEN")

def get_query_id(query, database, catalog_name):
    conn = Connection(
        host=HOST_NAME or "<host>",
        port=443,
        username=E6_USER or "<username>",
        password=E6_TOKEN or "<access_token>",
        database=database,
        catalog=catalog_name,
        secure=True,
        auto_resume=True
    )

    cursor = conn.cursor(catalog_name=catalog_name, database=database)
    query_id = cursor.execute(query)
    all_records = cursor.fetchall()
    
    cursor.clear()
    cursor.close()
    conn.close()
    
    return query_id

if __name__ == "__main__":
    get_query_id("<QUERY>", "<DATABASE>", "<CATALOG_NAME>")

Combine Multiple Functions

The following code is an example using multiple functions defined above to return:

  • Number of rows

  • The time taken to execute the query

  • Query results

cursor = conn.cursor(catalog_name=catalog_name)
query_id = cursor.execute(sql_query)
all_records = cursor.fetchall()
explain_response = cursor.explain_analyse()
planner_result = json.loads(explain_response.get('planner'))
execution_time = planner_result.get("total_query_time") / 1000
row_count = cursor.rowcount

columns = [col[0] for col in cursor.description]
results = [dict(zip(columns, row)) for row in all_records]

for row in results:
    print(row)

print(f"Total row count {row_count}, Execution Time (seconds): {execution_time}")

cursor.clear()
cursor.close()
conn.close()

Auto Resume

To enable Auto Resume while establishing a connection using the Python Connector, set auto_resume=True in the connection configuration. This ensures that the cluster resumes automatically when a query is executed, eliminating the need for manual intervention.

Note: Auto Resume must also be enabled at the cluster level to take effect.

conn = Connection(
    host=host,
    port=443,
    username=username,
    password=password,
    database=database,
    cluster_name='production-cluster',
    secure=True,
    auto_resume=True
)

Last updated