Catalogs

Understanding Catalogs in e6data

Most analytical data is stored in cloud object stores like Amazon S3, GCS, or Azure Blob Storage. However, structural metadata such as table names, schemas, and partitions managed by metastores like Hive, AWS Glue, Dataproc Metastore, Unity Catalog, or Apache Polaris.

In e6data, a Catalog connects to these metastores to provide the metadata needed for querying data stored in object stores efficiently.

Catalog Service
Description

Traditional metastore widely used with Hadoop and Spark.

AWS-managed metastore with schema versioning and S3 integration.

Databricks’ unified metadata and governance layer.

REST-based Iceberg catalog for scalable metadata management.

Cross-account support requires specific IAM configuration depending on the cloud provider.

Last updated