Getting started
Description: Migrate your PySpark application to e6data in Three steps: update imports, configure the connection, and run.
Getting Started
Step 1: Update Imports
from pyspark.sql import SparkSession
from pyspark.sql.functions import col, upper, count, sum, row_number
from pyspark.sql.window import Windowfrom e6_spark_compat import SparkSession
from e6_spark_compat.sql.functions import col, upper, count, sum, row_number
from e6_spark_compat.sql.window import Windowfrom sedona.register import SedonaRegistratorfrom e6_spark_compat.sedona import SedonaRegistratorStep 2: Configure the Connection
spark = SparkSession.builder \
.appName("MyApp") \
.config("spark.e6data.host", "<cluster-host>") \
.config("spark.e6data.username", "<username>") \
.config("spark.e6data.password", "<access-token>") \
.config("spark.e6data.database", "<database>") \
.config("spark.e6data.catalog", "<catalog>") \
.config("spark.e6data.cluster", "<cluster-name>") \
.config("spark.e6data.secure", True) \
.getOrCreate()Configuration Parameters
Parameter
Description
Required
Using Environment Variables
Step 3: Run Your Code
Catalog Operations
Closing the Session
Last updated
