# Performance and Integration Guide

### Overview

The e6Data Python Connector is a [PEP-249 compliant](https://peps.python.org/pep-0249/) DB-API 2.0 interface for interacting with e6Data clusters via gRPC. It supports secure connections, query execution, metadata retrieval, and robust error handling, making it easy to integrate into Python applications.

This document covers key features, components, execution lifecycle, gRPC configuration, query parameterization, error handling, performance testing, and best practices for using Python 3.12 or later.

### System Features:

* **Database Connection:**\
  Connect to e6Data clusters securely using SSL, with configurable connection parameters.
* **Query Execution:**\
  Support for prepared and parameterized queries using pyformat style.
* **Result Handling:**\
  Fetch results via fetchone(), fetchmany(), fetchall(), and memory-efficient fetchall\_buffer().
* **Metadata Access:**\
  Retrieve schemas, tables, and columns dynamically; support explain\_analyse for query diagnostics.
* **Error Handling & Authentication:**\
  Automatic retries on authentication failure and connection errors.
* **Testing:**\
  Comprehensive unit and integration tests ensure reliability.

### Components and Interfaces:

#### Connection Object:

Represents a live database connection.

| Parameter     | Type   | Description                                      |
| ------------- | ------ | ------------------------------------------------ |
| host          | string | IP or hostname of the e6Data cluster             |
| port          | int    | gRPC service port (e.g., 80, 443)                |
| username      | string | User email for authentication                    |
| password      | string | Personal Access Token from e6Data dashboard      |
| catalog       | string | Catalog name for metadata access                 |
| database      | string | Database name                                    |
| secure        | bool   | Use SSL-secured gRPC channel (default False)     |
| grpc\_options | dict   | Advanced gRPC settings (timeouts, message sizes) |

#### Cursor Object

Used to execute queries and fetch results. Tied to a Connection.

| Method               | Description                               |
| -------------------- | ----------------------------------------- |
| execute(sql, params) | Execute a SQL statement with parameters   |
| fetchone()           | Fetch a single row                        |
| fetchmany(size=N)    | Fetch N rows                              |
| fetchall()           | Fetch all rows (small datasets)           |
| fetchall\_buffer()   | Fetch all rows (optimized for large data) |
| explain\_analyse()   | Retrieve runtime query statistics         |
| get\_tables()        | List tables in current database           |
| get columns()        | List columns metadata for a table         |
| clear()              | Clear cached results to free memory       |

### Execution Details and Timings

#### Execution Lifecycle

1. **Initialization:** Establishes gRPC channel with optional SSL and connection options.
2. **Authentication:** Authenticates using username and Personal Access Token.
3. **Prepare Statement:** Queries compiled via prepare Statement API.
4. **Execution:** Executes prepared queries, collecting optional metrics (parsing, queuing time).
5. **Fetching:** Results streamed in batches for optimized memory use.

#### Best Practices to Minimize Execution Time

* Enable gRPC query caching.
* Limit result sets with SQL LIMIT clauses.
* Use connection pooling to reduce overhead.

### gRPC Options Configuration

The grpc\_options dictionary allows tuning of gRPC behaviors for performance and stability.

| Option                                       | Default            | Description                                   |
| -------------------------------------------- | ------------------ | --------------------------------------------- |
| grpc\_prepare\_timeout                       | 600000 ms (10 min) | Max wait time for query preparation           |
| keepalive\_timeout\_ms                       | 900000 ms          | Idle time before sending keepalive ping       |
| keepalive\_time\_ms                          | 30000 ms (30 sec)  | Interval between keepalive pings              |
| keepalive\_permit\_without\_calls            | 1                  | Allow keepalive even without active RPC calls |
| max\_receive\_message\_length                | -1 (unlimited)     | Max size of incoming messages                 |
| max\_send\_message\_length                   | 300 MB             | Max size of outgoing messages                 |
| http2.max\_pings\_without\_data              | 0 (unlimited)      | Max pings sent without data                   |
| http2.min\_time\_between\_pings\_ms          | 15000 ms (15 sec)  | Minimum interval between pings                |
| http2.min\_ping\_interval\_without\_data\_ms | 15000 ms (15 sec)  | Interval between pings without data exchange  |

#### Example Use Cases

* **Large Queries:**

```
grpc_options = {
    'max_receive_message_length': 500 * 1024 * 1024,
    'max_send_message_length': 500 * 1024 * 1024,
}
 
```

**Connection Stability:**

```
grpc_options = {
    'keepalive_time_ms': 10000,
    'keepalive_timeout_ms': 20000,
    'keepalive_permit_without_calls': 1,
}
```

#### Best Practices

* Start with defaults unless tuning is required.
* Monitor network and resource usage to adjust settings.
* Configure message size limits cautiously for large datasets.

### Internal Working of the Connector

During connection setup, gRPC channels are created with options passed in to configure behavior.

```
self._channel = grpc.insecure_channel(
    target=f"{self._host}:{self._port}",
    options=self._get_grpc_options
)
 
```

For SSL-secured connections:

```
self._channel = grpc.secure_channel(
    target=f"{self._host}:{self._port}",
    options=self._get_grpc_options,
    credentials=grpc.ssl_channel_credentials()
)
```

User-defined grpc\_options override defaults during channel creation.

### Using pyformat for Parameterized Queries

pyformat is a PEP-249 compliant parameter style using named placeholders (%(param)s) to write secure, readable SQL queries.

#### Example

```
sql = "SELECT * FROM users WHERE age > %(age)s"
params = {"age": 25}
cursor.execute(sql, params)
```

#### Benefits

* Prevents SQL injection by safely escaping parameters.
* Enhances query readability.
* Allows flexible, dynamic parameter passing.

#### Common Errors and Fixes

| KeyError: 'placeholder\_name'   | Missing parameter key in dictionary | Ensure all placeholders have matching keys |
| ------------------------------- | ----------------------------------- | ------------------------------------------ |
| TypeError: %d format            | Parameter type mismatch             | Match parameter types to placeholders      |
| Programming Error: syntax error | Incorrect SQL or placeholder usage  | Verify query syntax and placeholder format |

#### Best Practices

* Always use parameterized queries.
* Validate parameter types and inputs.
* Log queries cautiously (avoid sensitive data).
* Handle execution errors with try-except blocks.

### Error Handling and Recovery

Robust error handling with retry logic on common gRPC errors:

| Authentication | Invalid/expired credentials      | Retry authentication up to 5 times     |
| -------------- | -------------------------------- | -------------------------------------- |
| Connection     | Timeout or server unavailability | Graceful retries on \_InactiveRpcError |

### Performance Testing

#### Workflow

* Validate connection using environment variables.
* Execute various queries to measure latency and correctness.
* Test with caching enabled/disabled.
* Use explain\_analyse to verify query plans.

### Leveraging Python 3.12 or Later for Improved Performance

#### Key Benefits

* Faster startup and module import times.
* Improved iteration speed and memory management.
* Enhanced garbage collection for short-lived objects.
* Support for the perf module for profiling.
* Better error tracing for debugging.
* Typing enhancements for cleaner code.

#### Best Practices

* Use built-in functions for better efficiency (len(), max(), min()).
* Enable adaptive specialization via python3.12 -X adaptive.
* Utilize async improvements for gRPC async calls.
* Upgrade Python environment with virtual env and reinstall dependencies.

#### Transition Recommendations

```
python3.12 -m venv venv
source venv/bin/activate
pip install --upgrade .
```

#### Performance Testing Example

```
import time

start = time.perf_counter()
data = cursor.fetchall()
end = time.perf_counter()
print(f"Fetch Time: {end - start} seconds")

```
