> For the complete documentation index, see [llms.txt](https://docs.e6data.com/product-documentation/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.e6data.com/product-documentation/pyspark-compatibility/sql-functions.md).

# SQL Functions

All functions are imported from `e6_spark_compat.sql.functions`.

```python
from e6_spark_compat.sql.functions import col, lit, when, upper, count, sum
```

### Column References and Literals

| Function     | Description            | Example                    |
| ------------ | ---------------------- | -------------------------- |
| `col(name)`  | Reference a column     | `col("age")`               |
| `lit(value)` | Create a literal value | `lit(100)`                 |
| `expr(sql)`  | Raw SQL expression     | `expr("price * quantity")` |

### String Functions

| Function                            | Description                        |
| ----------------------------------- | ---------------------------------- |
| `upper(col)`                        | Convert to uppercase               |
| `lower(col)`                        | Convert to lowercase               |
| `trim(col)`                         | Remove leading and trailing spaces |
| `ltrim(col)`                        | Remove leading spaces              |
| `rtrim(col)`                        | Remove trailing spaces             |
| `length(col)`                       | String length                      |
| `substring(col, pos, len)`          | Extract substring                  |
| `concat(*cols)`                     | Concatenate strings                |
| `concat_ws(sep, *cols)`             | Concatenate with separator         |
| `split(col, pattern)`               | Split string into array            |
| `regexp_extract(col, pattern, idx)` | Extract regex match                |
| `regexp_replace(col, pattern, rep)` | Replace regex match                |
| `translate(col, matching, replace)` | Character-level replacement        |
| `lpad(col, len, pad)`               | Left-pad to length                 |
| `rpad(col, len, pad)`               | Right-pad to length                |
| `repeat(col, n)`                    | Repeat string n times              |
| `reverse(col)`                      | Reverse string                     |
| `format_string(fmt, *cols)`         | Printf-style formatting            |
| `format_number(col, d)`             | Format number with d decimals      |

#### Example

```python
df.select(
    upper(col("name")).alias("name_upper"),
    concat_ws(", ", col("city"), col("state")).alias("location"),
    substring(col("phone"), 1, 3).alias("area_code")
)
```

### Math Functions

| Function            | Description              |
| ------------------- | ------------------------ |
| `abs(col)`          | Absolute value           |
| `round(col, scale)` | Round to scale decimals  |
| `floor(col)`        | Round down               |
| `ceil(col)`         | Round up                 |
| `sqrt(col)`         | Square root              |
| `pow(col, exp)`     | Raise to power           |
| `log(col)`          | Natural logarithm        |
| `log10(col)`        | Base-10 logarithm        |
| `exp(col)`          | Exponential (e^x)        |
| `sin(col)`          | Sine                     |
| `cos(col)`          | Cosine                   |
| `tan(col)`          | Tangent                  |
| `asin(col)`         | Arc sine                 |
| `acos(col)`         | Arc cosine               |
| `atan(col)`         | Arc tangent              |
| `atan2(col1, col2)` | Two-argument arc tangent |
| `greatest(*cols)`   | Maximum of given columns |
| `least(*cols)`      | Minimum of given columns |

### Aggregate Functions

| Function                      | Description                        |
| ----------------------------- | ---------------------------------- |
| `count(col)`                  | Count of rows                      |
| `countDistinct(*cols)`        | Count of distinct values           |
| `sum(col)`                    | Sum of values                      |
| `avg(col)`                    | Average of values                  |
| `min(col)`                    | Minimum value                      |
| `max(col)`                    | Maximum value                      |
| `first(col)`                  | First value in group               |
| `last(col)`                   | Last value in group                |
| `collect_list(col)`           | Collect values into a list         |
| `collect_set(col)`            | Collect distinct values into a set |
| `stddev(col)`                 | Standard deviation                 |
| `variance(col)`               | Variance                           |
| `approx_percentile(col, pct)` | Approximate percentile             |

#### Example

```python
df.groupBy("department").agg(
    count("*").alias("headcount"),
    avg("salary").alias("avg_salary"),
    sum("salary").alias("total_salary"),
    countDistinct("title").alias("unique_titles")
)
```

### Date and Time Functions

| Function                       | Description              |
| ------------------------------ | ------------------------ |
| `current_date()`               | Current date             |
| `current_timestamp()`          | Current timestamp        |
| `year(col)`                    | Extract year             |
| `month(col)`                   | Extract month            |
| `day(col)` / `dayofmonth(col)` | Extract day of month     |
| `hour(col)`                    | Extract hour             |
| `minute(col)`                  | Extract minute           |
| `second(col)`                  | Extract second           |
| `date_add(col, days)`          | Add days to date         |
| `date_sub(col, days)`          | Subtract days from date  |
| `datediff(end, start)`         | Difference in days       |
| `to_date(col, fmt)`            | Convert to date          |
| `to_timestamp(col, fmt)`       | Convert to timestamp     |
| `from_unixtime(col, fmt)`      | Unix timestamp to string |
| `unix_timestamp(col, fmt)`     | String to Unix timestamp |

#### Example

```python
df.select(
    col("order_date"),
    year(col("order_date")).alias("order_year"),
    date_add(col("order_date"), 30).alias("due_date"),
    datediff(current_date(), col("order_date")).alias("days_ago")
)
```

### Conditional Functions

| Function                      | Description          |
| ----------------------------- | -------------------- |
| `when(condition, value)`      | CASE WHEN expression |
| `coalesce(*cols)`             | First non-null value |
| `isnull(col)`                 | Check if null        |
| `isnan(col)`                  | Check if NaN         |
| `if_(condition, true, false)` | Simple IF expression |

#### Example

```python
df.select(
    col("name"),
    when(col("age") >= 18, "adult")
        .when(col("age") >= 13, "teen")
        .otherwise("child")
        .alias("category"),
    coalesce(col("nickname"), col("name")).alias("display_name")
)
```

### JSON Functions

| Function                     | Description                    |
| ---------------------------- | ------------------------------ |
| `to_json(col)`               | Convert struct/map to JSON     |
| `from_json(col, schema)`     | Parse JSON string              |
| `get_json_object(col, path)` | Extract field from JSON string |

### Array Functions

| Function       | Description           |
| -------------- | --------------------- |
| `explode(col)` | Flatten array to rows |

### Other Functions

| Function          | Description                             |
| ----------------- | --------------------------------------- |
| `broadcast(df)`   | Broadcast hint for joins (pass-through) |
| `cast(col, type)` | Cast column to a different data type    |


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.e6data.com/product-documentation/pyspark-compatibility/sql-functions.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
