Getting started

Description: Migrate your PySpark application to e6data in Three steps: update imports, configure the connection, and run.

Getting Started

Step 1: Update Imports

Replace your PySpark imports with e6-spark-compat equivalents. The API is identical.

from pyspark.sql import SparkSession
from pyspark.sql.functions import col, upper, count, sum, row_number
from pyspark.sql.window import Window

For spatial (Sedona) operations:

from sedona.register import SedonaRegistrator

Step 2: Configure the Connection

Create a SparkSession pointing to your e6data cluster.

spark = SparkSession.builder \
    .appName("MyApp") \
    .config("spark.e6data.host", "<cluster-host>") \
    .config("spark.e6data.username", "<username>") \
    .config("spark.e6data.password", "<access-token>") \
    .config("spark.e6data.database", "<database>") \
    .config("spark.e6data.catalog", "<catalog>") \
    .config("spark.e6data.cluster", "<cluster-name>") \
    .config("spark.e6data.secure", True) \
    .getOrCreate()

Configuration Parameters

Parameter
Description
Required

spark.e6data.host

Cluster hostname or IP address

Yes

spark.e6data.username

e6data account email

Yes

spark.e6data.password

Personal Access Token from the e6data console

Yes

spark.e6data.database

Target database name

Yes

spark.e6data.catalog

Catalog name

Yes

spark.e6data.cluster

Cluster name

Yes

spark.e6data.secure

Use TLS for the connection (True or False). Default: True

No

circle-info

You can find your cluster hostname and connection details in the e6data console under Clusters > Connection Info.

circle-exclamation

Using Environment Variables

Step 3: Run Your Code

Your existing PySpark logic works without modification.

Catalog Operations

Discover databases, tables, and columns programmatically.

Closing the Session

Last updated