SQL Functions

SQL Functions Reference

All functions are imported from e6_spark_compat.sql.functions.

from e6_spark_compat.sql.functions import col, lit, when, upper, count, sum

Column References and Literals

Function

Description

Example

col(name)

Reference a column

col("age")

lit(value)

Create a literal value

lit(100)

expr(sql)

Raw SQL expression

expr("price * quantity")

String Functions

Function

Description

upper(col)

Convert to uppercase

lower(col)

Convert to lowercase

trim(col)

Remove leading and trailing spaces

ltrim(col)

Remove leading spaces

rtrim(col)

Remove trailing spaces

length(col)

String length

substring(col, pos, len)

Extract substring

concat(*cols)

Concatenate strings

concat_ws(sep, *cols)

Concatenate with separator

split(col, pattern)

Split string into array

regexp_extract(col, pattern, idx)

Extract regex match

regexp_replace(col, pattern, rep)

Replace regex match

translate(col, matching, replace)

Character-level replacement

lpad(col, len, pad)

Left-pad to length

rpad(col, len, pad)

Right-pad to length

repeat(col, n)

Repeat string n times

reverse(col)

Reverse string

format_string(fmt, *cols)

Printf-style formatting

format_number(col, d)

Format number with d decimals

Example

df.select(
    upper(col("name")).alias("name_upper"),
    concat_ws(", ", col("city"), col("state")).alias("location"),
    substring(col("phone"), 1, 3).alias("area_code")
)

Math Functions

Function

Description

abs(col)

Absolute value

round(col, scale)

Round to scale decimals

floor(col)

Round down

ceil(col)

Round up

sqrt(col)

Square root

pow(col, exp)

Raise to power

log(col)

Natural logarithm

log10(col)

Base-10 logarithm

exp(col)

Exponential (e^x)

sin(col)

Sine

cos(col)

Cosine

tan(col)

Tangent

asin(col)

Arc sine

acos(col)

Arc cosine

atan(col)

Arc tangent

atan2(col1, col2)

Two-argument arc tangent

greatest(*cols)

Maximum of given columns

least(*cols)

Minimum of given columns

Aggregate Functions

Function

Description

count(col)

Count of rows

countDistinct(*cols)

Count of distinct values

sum(col)

Sum of values

avg(col)

Average of values

min(col)

Minimum value

max(col)

Maximum value

first(col)

First value in group

last(col)

Last value in group

collect_list(col)

Collect values into a list

collect_set(col)

Collect distinct values into a set

stddev(col)

Standard deviation

variance(col)

Variance

approx_percentile(col, pct)

Approximate percentile

Example

df.groupBy("department").agg(
    count("*").alias("headcount"),
    avg("salary").alias("avg_salary"),
    sum("salary").alias("total_salary"),
    countDistinct("title").alias("unique_titles")
)

Date and Time Functions

Function

Description

current_date()

Current date

current_timestamp()

Current timestamp

year(col)

Extract year

month(col)

Extract month

day(col) / dayofmonth(col)

Extract day of month

hour(col)

Extract hour

minute(col)

Extract minute

second(col)

Extract second

date_add(col, days)

Add days to date

date_sub(col, days)

Subtract days from date

datediff(end, start)

Difference in days

to_date(col, fmt)

Convert to date

to_timestamp(col, fmt)

Convert to timestamp

from_unixtime(col, fmt)

Unix timestamp to string

unix_timestamp(col, fmt)

String to Unix timestamp

Example

df.select(
    col("order_date"),
    year(col("order_date")).alias("order_year"),
    date_add(col("order_date"), 30).alias("due_date"),
    datediff(current_date(), col("order_date")).alias("days_ago")
)

Conditional Functions

Function

Description

when(condition, value)

CASE WHEN expression

coalesce(*cols)

First non-null value

isnull(col)

Check if null

isnan(col)

Check if NaN

if_(condition, true, false)

Simple IF expression

Example

df.select(
    col("name"),
    when(col("age") >= 18, "adult")
        .when(col("age") >= 13, "teen")
        .otherwise("child")
        .alias("category"),
    coalesce(col("nickname"), col("name")).alias("display_name")
)

JSON Functions

Function

Description

to_json(col)

Convert struct/map to JSON

from_json(col, schema)

Parse JSON string

get_json_object(col, path)

Extract field from JSON string

Array Functions

Function

Description

explode(col)

Flatten array to rows

Other Functions

Function

Description

broadcast(df)

Broadcast hint for joins (pass-through)

cast(col, type)

Cast column to a different data type

PreviousDataFrame Operations NextSecurity & Trust

Last updated 12 hours ago

hashtagSQL Functions Reference

hashtagColumn References and Literals

hashtagString Functions

hashtagExample

hashtagMath Functions

hashtagAggregate Functions

hashtagExample

hashtagDate and Time Functions

hashtagExample

hashtagConditional Functions

hashtagExample

hashtagJSON Functions

hashtagArray Functions

hashtagOther Functions

SQL Functions Reference

Column References and Literals

String Functions

Example

Math Functions

Aggregate Functions

Example

Date and Time Functions

Example

Conditional Functions

Example

JSON Functions

Array Functions

Other Functions