# Hybrid Data Lakehouse

### Overview <a href="#overview" id="overview"></a>

e6data’s Hybrid Data Lakehouse enables enterprises to query data across clouds, regions, and on-premises without migrating data. Powered by a federated SQL engine and hybrid cluster architecture, this feature ensures **data autonomy**, **regulatory compliance**, and **cost efficiency,** all while presenting a **unified query interface** to the user.

{% hint style="info" %}
**This feature is currently in active development and only available to users on request.**
{% endhint %}

<figure><img src="https://3484040590-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FeVBYKZm1xFKFFVzS0lRJ%2Fuploads%2FeQIryaDYn92mEwYbsFCl%2FScreenshot%202025-08-04%20at%202.40.12%E2%80%AFPM.png?alt=media&#x26;token=15f568b0-3693-484f-8f93-ebd56fac3783" alt=""><figcaption></figcaption></figure>

### What It Does <a href="#what-it-does" id="what-it-does"></a>

* Connects heterogeneous data sources across clouds, regions, or ownership boundaries.
* Provides a federated SQL interface for seamless querying.
* Minimizes data movement by **pushing compute to where the data lives**.
* Enhances governance, security, and cost efficiency in distributed environments.

### How It Works <a href="#how-it-works" id="how-it-works"></a>

The hybrid lakehouse is built using a **hybrid cluster architecture**:

| Component                 | Description                                                            |
| ------------------------- | ---------------------------------------------------------------------- |
| **Main Cluster**          | Acts as the entry point for all client queries.                        |
| **Ancillary Clusters**    | Deployed close to data sources and execute tasks locally.              |
| **Routing Layer**         | Main cluster routes compute to the appropriate ancillary cluster.      |
| **Secure Gateway**        | Ensures encrypted communication between clusters.                      |
| **Federated SQL Engine**  | Provides a single query interface for users across sources and clouds. |
| **Metastore Flexibility** | Supports both centralized and federated metastores.                    |

Even though data and compute are different layers and actions, the **user always experiences a single, unified query layer**.

### How to Set It Up <a href="#how-to-set-it-up" id="how-to-set-it-up"></a>

1. **Identify data sources and federation rules.**
2. **Perform capacity planning** for each region or cloud.
3. **Choose metastore model:**
   * Centralized (common metastore)
   * Federated (per region/cloud)
4. Use e6data’s automated deployment scripts to:
   * Provision hybrid clusters
   * Configure auto-scaling
   * Wire all clients to the main cluster

The entire setup is **automated** and requires **minimal customer input**.

### Supported Platforms & Features <a href="#supported-platforms-and-features" id="supported-platforms-and-features"></a>

* Multi-cloud (AWS, Azure, GCP)
* Object storage (S3, ADLS, GCS)
* Metastores (Hive, AWS Glue, etc.)
* Full compatibility with e6data SQL workloads

### Future Enhancements <a href="#future-enhancements" id="future-enhancements"></a>

* Access control policies must be defined centrally at the main cluster.
* Latency may be higher if most data resides away from the main cluster.
* Currently supports statically defined main clusters (dynamic routing is in progress).

### Use Cases <a href="#use-cases" id="use-cases"></a>

* Global enterprises with **compliance requirements** in multiple geographies
* Organizations seeking to **reduce cross-region/cloud egress fees**
* Businesses wanting to avoid **vendor lock-in** and promote **cloud neutrality**
* Teams managing **data silos** across departments or subsidiaries


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.e6data.com/product-documentation/introduction-to-e6data/architecture/hybrid-data-lakehouse.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
