# Hybrid Data Lakehouse

### Overview <a href="#overview" id="overview"></a>

e6data’s Hybrid Data Lakehouse enables enterprises to query data across clouds, regions, and on-premises without migrating data. Powered by a federated SQL engine and hybrid cluster architecture, this feature ensures **data autonomy**, **regulatory compliance**, and **cost efficiency,** all while presenting a **unified query interface** to the user.

{% hint style="info" %}
**This feature is currently in active development and only available to users on request.**
{% endhint %}

<figure><img src="https://3484040590-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FeVBYKZm1xFKFFVzS0lRJ%2Fuploads%2FeQIryaDYn92mEwYbsFCl%2FScreenshot%202025-08-04%20at%202.40.12%E2%80%AFPM.png?alt=media&#x26;token=15f568b0-3693-484f-8f93-ebd56fac3783" alt=""><figcaption></figcaption></figure>

### What It Does <a href="#what-it-does" id="what-it-does"></a>

* Connects heterogeneous data sources across clouds, regions, or ownership boundaries.
* Provides a federated SQL interface for seamless querying.
* Minimizes data movement by **pushing compute to where the data lives**.
* Enhances governance, security, and cost efficiency in distributed environments.

### How It Works <a href="#how-it-works" id="how-it-works"></a>

The hybrid lakehouse is built using a **hybrid cluster architecture**:

| Component                 | Description                                                            |
| ------------------------- | ---------------------------------------------------------------------- |
| **Main Cluster**          | Acts as the entry point for all client queries.                        |
| **Ancillary Clusters**    | Deployed close to data sources and execute tasks locally.              |
| **Routing Layer**         | Main cluster routes compute to the appropriate ancillary cluster.      |
| **Secure Gateway**        | Ensures encrypted communication between clusters.                      |
| **Federated SQL Engine**  | Provides a single query interface for users across sources and clouds. |
| **Metastore Flexibility** | Supports both centralized and federated metastores.                    |

Even though data and compute are different layers and actions, the **user always experiences a single, unified query layer**.

### How to Set It Up <a href="#how-to-set-it-up" id="how-to-set-it-up"></a>

1. **Identify data sources and federation rules.**
2. **Perform capacity planning** for each region or cloud.
3. **Choose metastore model:**
   * Centralized (common metastore)
   * Federated (per region/cloud)
4. Use e6data’s automated deployment scripts to:
   * Provision hybrid clusters
   * Configure auto-scaling
   * Wire all clients to the main cluster

The entire setup is **automated** and requires **minimal customer input**.

### Supported Platforms & Features <a href="#supported-platforms-and-features" id="supported-platforms-and-features"></a>

* Multi-cloud (AWS, Azure, GCP)
* Object storage (S3, ADLS, GCS)
* Metastores (Hive, AWS Glue, etc.)
* Full compatibility with e6data SQL workloads

### Future Enhancements <a href="#future-enhancements" id="future-enhancements"></a>

* Access control policies must be defined centrally at the main cluster.
* Latency may be higher if most data resides away from the main cluster.
* Currently supports statically defined main clusters (dynamic routing is in progress).

### Use Cases <a href="#use-cases" id="use-cases"></a>

* Global enterprises with **compliance requirements** in multiple geographies
* Organizations seeking to **reduce cross-region/cloud egress fees**
* Businesses wanting to avoid **vendor lock-in** and promote **cloud neutrality**
* Teams managing **data silos** across departments or subsidiaries
