Introduction

PySpark Compatibility Layer

Overview

e6-spark-compat is a drop-in compatibility library that lets you run existing PySpark and Apache Sedona code on e6data. Update your import statements, configure the e6data connection, and your Spark code works as-is — no rewrites needed.

DataFrame operations are lazily evaluated. Transformations build a query plan tree, and when an action (collect, show, count) is called, the plan is translated into optimized SQL using SQLGlot and executed on e6data.

Key Capabilities

  • Full PySpark DataFrame API — select, filter, join, groupBy, orderBy, union, pivot, and more

  • 130+ SQL functions — string, math, aggregate, date/time, window, conditional

  • Window functions with complete Window specification API

  • 70+ Apache Sedona-compatible spatial functions (ST_*)

  • File format support — Parquet, ORC, CSV, JSON, GeoParquet, Delta

  • Read and write operations

Installation

# Install from PyPI
pip install e6data-spark-compatibility

# With spatial support
pip install e6data-spark-compatibility[spatial]

# Install from GitHub
pip install git+https://github.com/e6data/e6-spark-compat.git

Prerequisites

  • An active e6data workspace and cluster

  • A Personal Access Token from the e6data console (User Settings > Personal Access Tokens)

  • Python 3.8+

Quick Example

Last updated