Performance and Integration Guide
Last updated
Last updated
The e6Data Python Connector is a DB-API 2.0 interface for interacting with e6Data clusters via gRPC. It supports secure connections, query execution, metadata retrieval, and robust error handling, making it easy to integrate into Python applications.
This document covers key features, components, execution lifecycle, gRPC configuration, query parameterization, error handling, performance testing, and best practices for using Python 3.12 or later.
Database Connection: Connect to e6Data clusters securely using SSL, with configurable connection parameters.
Query Execution: Support for prepared and parameterized queries using pyformat style.
Result Handling: Fetch results via fetchone(), fetchmany(), fetchall(), and memory-efficient fetchall_buffer().
Metadata Access: Retrieve schemas, tables, and columns dynamically; support explain_analyse for query diagnostics.
Error Handling & Authentication: Automatic retries on authentication failure and connection errors.
Testing: Comprehensive unit and integration tests ensure reliability.
Represents a live database connection.
host
string
IP or hostname of the e6Data cluster
port
int
gRPC service port (e.g., 80, 443)
username
string
User email for authentication
password
string
Personal Access Token from e6Data dashboard
catalog
string
Catalog name for metadata access
database
string
Database name
secure
bool
Use SSL-secured gRPC channel (default False)
grpc_options
dict
Advanced gRPC settings (timeouts, message sizes)
Used to execute queries and fetch results. Tied to a Connection.
execute(sql, params)
Execute a SQL statement with parameters
fetchone()
Fetch a single row
fetchmany(size=N)
Fetch N rows
fetchall()
Fetch all rows (small datasets)
fetchall_buffer()
Fetch all rows (optimized for large data)
explain_analyse()
Retrieve runtime query statistics
get_tables()
List tables in current database
get columns()
List columns metadata for a table
clear()
Clear cached results to free memory
Initialization: Establishes gRPC channel with optional SSL and connection options.
Authentication: Authenticates using username and Personal Access Token.
Prepare Statement: Queries compiled via prepare Statement API.
Execution: Executes prepared queries, collecting optional metrics (parsing, queuing time).
Fetching: Results streamed in batches for optimized memory use.
Enable gRPC query caching.
Limit result sets with SQL LIMIT clauses.
Use connection pooling to reduce overhead.
The grpc_options dictionary allows tuning of gRPC behaviors for performance and stability.
grpc_prepare_timeout
600000 ms (10 min)
Max wait time for query preparation
keepalive_timeout_ms
900000 ms
Idle time before sending keepalive ping
keepalive_time_ms
30000 ms (30 sec)
Interval between keepalive pings
keepalive_permit_without_calls
1
Allow keepalive even without active RPC calls
max_receive_message_length
-1 (unlimited)
Max size of incoming messages
max_send_message_length
300 MB
Max size of outgoing messages
http2.max_pings_without_data
0 (unlimited)
Max pings sent without data
http2.min_time_between_pings_ms
15000 ms (15 sec)
Minimum interval between pings
http2.min_ping_interval_without_data_ms
15000 ms (15 sec)
Interval between pings without data exchange
Large Queries:
Connection Stability:
Start with defaults unless tuning is required.
Monitor network and resource usage to adjust settings.
Configure message size limits cautiously for large datasets.
During connection setup, gRPC channels are created with options passed in to configure behavior.
For SSL-secured connections:
User-defined grpc_options override defaults during channel creation.
pyformat is a PEP-249 compliant parameter style using named placeholders (%(param)s) to write secure, readable SQL queries.
Prevents SQL injection by safely escaping parameters.
Enhances query readability.
Allows flexible, dynamic parameter passing.
KeyError: 'placeholder_name'
Missing parameter key in dictionary
Ensure all placeholders have matching keys
TypeError: %d format
Parameter type mismatch
Match parameter types to placeholders
Programming Error: syntax error
Incorrect SQL or placeholder usage
Verify query syntax and placeholder format
Always use parameterized queries.
Validate parameter types and inputs.
Log queries cautiously (avoid sensitive data).
Handle execution errors with try-except blocks.
Robust error handling with retry logic on common gRPC errors:
Authentication
Invalid/expired credentials
Retry authentication up to 5 times
Connection
Timeout or server unavailability
Graceful retries on _InactiveRpcError
Validate connection using environment variables.
Execute various queries to measure latency and correctness.
Test with caching enabled/disabled.
Use explain_analyse to verify query plans.
Faster startup and module import times.
Improved iteration speed and memory management.
Enhanced garbage collection for short-lived objects.
Support for the perf module for profiling.
Better error tracing for debugging.
Typing enhancements for cleaner code.
Use built-in functions for better efficiency (len(), max(), min()).
Enable adaptive specialization via python3.12 -X adaptive.
Utilize async improvements for gRPC async calls.
Upgrade Python environment with virtual env and reinstall dependencies.