Code samples

Description: End-to-end code samples demonstrating common PySpark workflows on e6data.

Code Samples

Basic Analytics Pipeline

from e6_spark_compat import SparkSession
from e6_spark_compat.sql.functions import col, count, sum, avg, upper

spark = SparkSession.builder \
    .appName("SalesAnalytics") \
    .config("spark.e6data.host", "<cluster-host>") \
    .config("spark.e6data.username", "<username>") \
    .config("spark.e6data.password", "<access-token>") \
    .config("spark.e6data.database", "sales_db") \
    .config("spark.e6data.catalog", "main") \
    .config("spark.e6data.cluster", "<cluster-name>") \
    .config("spark.e6data.secure", True) \
    .getOrCreate()

# Read and analyze
orders = spark.read.parquet("s3://data-lake/orders/")

summary = orders \
    .filter(col("status") == "completed") \
    .groupBy("region") \
    .agg(
        count("*").alias("order_count"),
        sum("amount").alias("total_revenue"),
        avg("amount").alias("avg_order_value")
    ) \
    .orderBy(col("total_revenue").desc())

summary.show()
spark.stop()

Window Functions: Running Totals and Rankings

Multi-Table Join

CASE WHEN Logic

Spatial Analysis: Points in Polygons

Working with Temp Views and SQL

Export to Pandas

Writing Results

Last updated