Code samples
Description: End-to-end code samples demonstrating common PySpark workflows on e6data.
Code Samples
Basic Analytics Pipeline
from e6_spark_compat import SparkSession
from e6_spark_compat.sql.functions import col, count, sum, avg, upper
spark = SparkSession.builder \
.appName("SalesAnalytics") \
.config("spark.e6data.host", "<cluster-host>") \
.config("spark.e6data.username", "<username>") \
.config("spark.e6data.password", "<access-token>") \
.config("spark.e6data.database", "sales_db") \
.config("spark.e6data.catalog", "main") \
.config("spark.e6data.cluster", "<cluster-name>") \
.config("spark.e6data.secure", True) \
.getOrCreate()
# Read and analyze
orders = spark.read.parquet("s3://data-lake/orders/")
summary = orders \
.filter(col("status") == "completed") \
.groupBy("region") \
.agg(
count("*").alias("order_count"),
sum("amount").alias("total_revenue"),
avg("amount").alias("avg_order_value")
) \
.orderBy(col("total_revenue").desc())
summary.show()
spark.stop()Window Functions: Running Totals and Rankings
Multi-Table Join
CASE WHEN Logic
Spatial Analysis: Points in Polygons
Working with Temp Views and SQL
Export to Pandas
Writing Results
Last updated
