# SQL Functions

All functions are imported from `e6_spark_compat.sql.functions`.

```python
from e6_spark_compat.sql.functions import col, lit, when, upper, count, sum
```

### Column References and Literals

| Function     | Description            | Example                    |
| ------------ | ---------------------- | -------------------------- |
| `col(name)`  | Reference a column     | `col("age")`               |
| `lit(value)` | Create a literal value | `lit(100)`                 |
| `expr(sql)`  | Raw SQL expression     | `expr("price * quantity")` |

### String Functions

| Function                            | Description                        |
| ----------------------------------- | ---------------------------------- |
| `upper(col)`                        | Convert to uppercase               |
| `lower(col)`                        | Convert to lowercase               |
| `trim(col)`                         | Remove leading and trailing spaces |
| `ltrim(col)`                        | Remove leading spaces              |
| `rtrim(col)`                        | Remove trailing spaces             |
| `length(col)`                       | String length                      |
| `substring(col, pos, len)`          | Extract substring                  |
| `concat(*cols)`                     | Concatenate strings                |
| `concat_ws(sep, *cols)`             | Concatenate with separator         |
| `split(col, pattern)`               | Split string into array            |
| `regexp_extract(col, pattern, idx)` | Extract regex match                |
| `regexp_replace(col, pattern, rep)` | Replace regex match                |
| `translate(col, matching, replace)` | Character-level replacement        |
| `lpad(col, len, pad)`               | Left-pad to length                 |
| `rpad(col, len, pad)`               | Right-pad to length                |
| `repeat(col, n)`                    | Repeat string n times              |
| `reverse(col)`                      | Reverse string                     |
| `format_string(fmt, *cols)`         | Printf-style formatting            |
| `format_number(col, d)`             | Format number with d decimals      |

#### Example

```python
df.select(
    upper(col("name")).alias("name_upper"),
    concat_ws(", ", col("city"), col("state")).alias("location"),
    substring(col("phone"), 1, 3).alias("area_code")
)
```

### Math Functions

| Function            | Description              |
| ------------------- | ------------------------ |
| `abs(col)`          | Absolute value           |
| `round(col, scale)` | Round to scale decimals  |
| `floor(col)`        | Round down               |
| `ceil(col)`         | Round up                 |
| `sqrt(col)`         | Square root              |
| `pow(col, exp)`     | Raise to power           |
| `log(col)`          | Natural logarithm        |
| `log10(col)`        | Base-10 logarithm        |
| `exp(col)`          | Exponential (e^x)        |
| `sin(col)`          | Sine                     |
| `cos(col)`          | Cosine                   |
| `tan(col)`          | Tangent                  |
| `asin(col)`         | Arc sine                 |
| `acos(col)`         | Arc cosine               |
| `atan(col)`         | Arc tangent              |
| `atan2(col1, col2)` | Two-argument arc tangent |
| `greatest(*cols)`   | Maximum of given columns |
| `least(*cols)`      | Minimum of given columns |

### Aggregate Functions

| Function                      | Description                        |
| ----------------------------- | ---------------------------------- |
| `count(col)`                  | Count of rows                      |
| `countDistinct(*cols)`        | Count of distinct values           |
| `sum(col)`                    | Sum of values                      |
| `avg(col)`                    | Average of values                  |
| `min(col)`                    | Minimum value                      |
| `max(col)`                    | Maximum value                      |
| `first(col)`                  | First value in group               |
| `last(col)`                   | Last value in group                |
| `collect_list(col)`           | Collect values into a list         |
| `collect_set(col)`            | Collect distinct values into a set |
| `stddev(col)`                 | Standard deviation                 |
| `variance(col)`               | Variance                           |
| `approx_percentile(col, pct)` | Approximate percentile             |

#### Example

```python
df.groupBy("department").agg(
    count("*").alias("headcount"),
    avg("salary").alias("avg_salary"),
    sum("salary").alias("total_salary"),
    countDistinct("title").alias("unique_titles")
)
```

### Date and Time Functions

| Function                       | Description              |
| ------------------------------ | ------------------------ |
| `current_date()`               | Current date             |
| `current_timestamp()`          | Current timestamp        |
| `year(col)`                    | Extract year             |
| `month(col)`                   | Extract month            |
| `day(col)` / `dayofmonth(col)` | Extract day of month     |
| `hour(col)`                    | Extract hour             |
| `minute(col)`                  | Extract minute           |
| `second(col)`                  | Extract second           |
| `date_add(col, days)`          | Add days to date         |
| `date_sub(col, days)`          | Subtract days from date  |
| `datediff(end, start)`         | Difference in days       |
| `to_date(col, fmt)`            | Convert to date          |
| `to_timestamp(col, fmt)`       | Convert to timestamp     |
| `from_unixtime(col, fmt)`      | Unix timestamp to string |
| `unix_timestamp(col, fmt)`     | String to Unix timestamp |

#### Example

```python
df.select(
    col("order_date"),
    year(col("order_date")).alias("order_year"),
    date_add(col("order_date"), 30).alias("due_date"),
    datediff(current_date(), col("order_date")).alias("days_ago")
)
```

### Conditional Functions

| Function                      | Description          |
| ----------------------------- | -------------------- |
| `when(condition, value)`      | CASE WHEN expression |
| `coalesce(*cols)`             | First non-null value |
| `isnull(col)`                 | Check if null        |
| `isnan(col)`                  | Check if NaN         |
| `if_(condition, true, false)` | Simple IF expression |

#### Example

```python
df.select(
    col("name"),
    when(col("age") >= 18, "adult")
        .when(col("age") >= 13, "teen")
        .otherwise("child")
        .alias("category"),
    coalesce(col("nickname"), col("name")).alias("display_name")
)
```

### JSON Functions

| Function                     | Description                    |
| ---------------------------- | ------------------------------ |
| `to_json(col)`               | Convert struct/map to JSON     |
| `from_json(col, schema)`     | Parse JSON string              |
| `get_json_object(col, path)` | Extract field from JSON string |

### Array Functions

| Function       | Description           |
| -------------- | --------------------- |
| `explode(col)` | Flatten array to rows |

### Other Functions

| Function          | Description                             |
| ----------------- | --------------------------------------- |
| `broadcast(df)`   | Broadcast hint for joins (pass-through) |
| `cast(col, type)` | Cast column to a different data type    |
