Skip to content

fenic.api.functions

Functions for working with DataFrame columns.

Functions:

  • array

    Creates a new array column from multiple input columns.

  • array_agg

    Alias for collect_list().

  • array_contains

    Checks if array column contains a specific value.

  • array_size

    Returns the number of elements in an array column.

  • asc

    Creates a Column expression representing an ascending sort order.

  • asc_nulls_first

    Creates a Column expression representing an ascending sort order with nulls first.

  • asc_nulls_last

    Creates a Column expression representing an ascending sort order with nulls last.

  • avg

    Aggregate function: returns the average (mean) of all values in the specified column.

  • coalesce

    Returns the first non-null value from the given columns for each row.

  • col

    Creates a Column expression referencing a column in the DataFrame.

  • collect_list

    Aggregate function: collects all values from the specified column into a list.

  • count

    Aggregate function: returns the count of non-null values in the specified column.

  • desc

    Creates a Column expression representing a descending sort order.

  • desc_nulls_first

    Creates a Column expression representing a descending sort order with nulls first.

  • desc_nulls_last

    Creates a Column expression representing a descending sort order with nulls last.

  • lit

    Creates a Column expression representing a literal value.

  • max

    Aggregate function: returns the maximum value in the specified column.

  • mean

    Aggregate function: returns the mean (average) of all values in the specified column.

  • min

    Aggregate function: returns the minimum value in the specified column.

  • struct

    Creates a new struct column from multiple input columns.

  • sum

    Aggregate function: returns the sum of all values in the specified column.

  • udf

    A decorator or function for creating user-defined functions (UDFs) that can be applied to DataFrame rows.

  • when

    Evaluates a condition and returns a value if true.

array

array(*args: Union[ColumnOrName, List[ColumnOrName], Tuple[ColumnOrName, ...]]) -> Column

Creates a new array column from multiple input columns.

Parameters:

  • *args (Union[ColumnOrName, List[ColumnOrName], Tuple[ColumnOrName, ...]], default: () ) –

    Columns or column names to combine into an array. Can be:

    • Individual arguments
    • Lists of columns/column names
    • Tuples of columns/column names

Returns:

  • Column

    A Column expression representing an array containing values from the input columns

Raises:

  • TypeError

    If any argument is not a Column, string, or collection of Columns/strings

Source code in src/fenic/api/functions/builtin.py
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
@validate_call(config=ConfigDict(strict=True, arbitrary_types_allowed=True))
def array(
    *args: Union[ColumnOrName, List[ColumnOrName], Tuple[ColumnOrName, ...]]
) -> Column:
    """Creates a new array column from multiple input columns.

    Args:
        *args: Columns or column names to combine into an array. Can be:

            - Individual arguments
            - Lists of columns/column names
            - Tuples of columns/column names

    Returns:
        A Column expression representing an array containing values from the input columns

    Raises:
        TypeError: If any argument is not a Column, string, or collection of
            Columns/strings
    """
    flattened_args = []
    for arg in args:
        if isinstance(arg, (list, tuple)):
            flattened_args.extend(arg)
        else:
            flattened_args.append(arg)

    expr_columns = [Column._from_col_or_name(c)._logical_expr for c in flattened_args]

    return Column._from_logical_expr(ArrayExpr(expr_columns))

array_agg

array_agg(column: ColumnOrName) -> Column

Alias for collect_list().

Source code in src/fenic/api/functions/builtin.py
160
161
162
163
@validate_call(config=ConfigDict(strict=True, arbitrary_types_allowed=True))
def array_agg(column: ColumnOrName) -> Column:
    """Alias for collect_list()."""
    return collect_list(column)

array_contains

array_contains(column: ColumnOrName, value: Union[str, int, float, bool, Column]) -> Column

Checks if array column contains a specific value.

This function returns True if the array in the specified column contains the given value, and False otherwise. Returns False if the array is None.

Parameters:

  • column (ColumnOrName) –

    Column or column name containing the arrays to check.

  • value (Union[str, int, float, bool, Column]) –

    Value to search for in the arrays. Can be: - A literal value (string, number, boolean) - A Column expression

Returns:

  • Column

    A boolean Column expression (True if value is found, False otherwise).

Raises:

  • TypeError

    If value type is incompatible with the array element type.

  • TypeError

    If the column does not contain array data.

Check for values in arrays
# Check if 'python' exists in arrays in the 'tags' column
df.select(array_contains("tags", "python"))

# Check using a value from another column
df.select(array_contains("tags", col("search_term")))
Source code in src/fenic/api/functions/builtin.py
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
@validate_call(config=ConfigDict(strict=True, arbitrary_types_allowed=True))
def array_contains(
    column: ColumnOrName, value: Union[str, int, float, bool, Column]
) -> Column:
    """Checks if array column contains a specific value.

    This function returns True if the array in the specified column contains the given value,
    and False otherwise. Returns False if the array is None.

    Args:
        column: Column or column name containing the arrays to check.

        value: Value to search for in the arrays. Can be:
            - A literal value (string, number, boolean)
            - A Column expression

    Returns:
        A boolean Column expression (True if value is found, False otherwise).

    Raises:
        TypeError: If value type is incompatible with the array element type.
        TypeError: If the column does not contain array data.

    Example: Check for values in arrays
        ```python
        # Check if 'python' exists in arrays in the 'tags' column
        df.select(array_contains("tags", "python"))

        # Check using a value from another column
        df.select(array_contains("tags", col("search_term")))
        ```
    """
    value_column = None
    if isinstance(value, Column):
        value_column = value
    else:
        value_column = lit(value)
    return Column._from_logical_expr(
        ArrayContainsExpr(
            Column._from_col_or_name(column)._logical_expr, value_column._logical_expr
        )
    )

array_size

array_size(column: ColumnOrName) -> Column

Returns the number of elements in an array column.

This function computes the length of arrays stored in the specified column. Returns None for None arrays.

Parameters:

  • column (ColumnOrName) –

    Column or column name containing arrays whose length to compute.

Returns:

  • Column

    A Column expression representing the array length.

Raises:

  • TypeError

    If the column does not contain array data.

Get array sizes
# Get the size of arrays in 'tags' column
df.select(array_size("tags"))

# Use with column reference
df.select(array_size(col("tags")))
Source code in src/fenic/api/functions/builtin.py
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
@validate_call(config=ConfigDict(strict=True, arbitrary_types_allowed=True))
def array_size(column: ColumnOrName) -> Column:
    """Returns the number of elements in an array column.

    This function computes the length of arrays stored in the specified column.
    Returns None for None arrays.

    Args:
        column: Column or column name containing arrays whose length to compute.

    Returns:
        A Column expression representing the array length.

    Raises:
        TypeError: If the column does not contain array data.

    Example: Get array sizes
        ```python
        # Get the size of arrays in 'tags' column
        df.select(array_size("tags"))

        # Use with column reference
        df.select(array_size(col("tags")))
        ```
    """
    return Column._from_logical_expr(
        ArrayLengthExpr(Column._from_col_or_name(column)._logical_expr)
    )

asc

asc(column: ColumnOrName) -> Column

Creates a Column expression representing an ascending sort order.

Parameters:

  • column (ColumnOrName) –

    The column to apply the ascending ordering to.

Returns:

  • Column

    A Column expression representing the column and the ascending sort order.

Raises:

  • ValueError

    If the type of the column cannot be inferred.

  • Error

    If this expression is passed to a dataframe operation besides sort() and order_by().

Source code in src/fenic/api/functions/builtin.py
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
@validate_call(config=ConfigDict(strict=True, arbitrary_types_allowed=True))
def asc(column: ColumnOrName) -> Column:
    """Creates a Column expression representing an ascending sort order.

    Args:
        column: The column to apply the ascending ordering to.

    Returns:
        A Column expression representing the column and the ascending sort order.

    Raises:
        ValueError: If the type of the column cannot be inferred.
        Error: If this expression is passed to a dataframe operation besides sort() and order_by().
    """
    return Column._from_col_or_name(column).asc()

asc_nulls_first

asc_nulls_first(column: ColumnOrName) -> Column

Creates a Column expression representing an ascending sort order with nulls first.

Parameters:

  • column (ColumnOrName) –

    The column to apply the ascending ordering to.

Returns:

  • Column

    A Column expression representing the column and the ascending sort order with nulls first.

Raises:

  • ValueError

    If the type of the column cannot be inferred.

  • Error

    If this expression is passed to a dataframe operation besides sort() and order_by().

Source code in src/fenic/api/functions/builtin.py
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
@validate_call(config=ConfigDict(strict=True, arbitrary_types_allowed=True))
def asc_nulls_first(column: ColumnOrName) -> Column:
    """Creates a Column expression representing an ascending sort order with nulls first.

    Args:
        column: The column to apply the ascending ordering to.

    Returns:
        A Column expression representing the column and the ascending sort order with nulls first.

    Raises:
        ValueError: If the type of the column cannot be inferred.
        Error: If this expression is passed to a dataframe operation besides sort() and order_by().
    """
    return Column._from_col_or_name(column).asc_nulls_first()

asc_nulls_last

asc_nulls_last(column: ColumnOrName) -> Column

Creates a Column expression representing an ascending sort order with nulls last.

Parameters:

  • column (ColumnOrName) –

    The column to apply the ascending ordering to.

Returns:

  • Column

    A Column expression representing the column and the ascending sort order with nulls last.

Raises:

  • ValueError

    If the type of the column cannot be inferred.

  • Error

    If this expression is passed to a dataframe operation besides sort() and order_by().

Source code in src/fenic/api/functions/builtin.py
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
@validate_call(config=ConfigDict(strict=True, arbitrary_types_allowed=True))
def asc_nulls_last(column: ColumnOrName) -> Column:
    """Creates a Column expression representing an ascending sort order with nulls last.

    Args:
        column: The column to apply the ascending ordering to.

    Returns:
        A Column expression representing the column and the ascending sort order with nulls last.

    Raises:
        ValueError: If the type of the column cannot be inferred.
        Error: If this expression is passed to a dataframe operation besides sort() and order_by().
    """
    return Column._from_col_or_name(column).asc_nulls_last()

avg

avg(column: ColumnOrName) -> Column

Aggregate function: returns the average (mean) of all values in the specified column.

Parameters:

  • column (ColumnOrName) –

    Column or column name to compute the average of

Returns:

  • Column

    A Column expression representing the average aggregation

Raises:

  • TypeError

    If column is not a Column or string

Source code in src/fenic/api/functions/builtin.py
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
@validate_call(config=ConfigDict(strict=True, arbitrary_types_allowed=True))
def avg(column: ColumnOrName) -> Column:
    """Aggregate function: returns the average (mean) of all values in the specified column.

    Args:
        column: Column or column name to compute the average of

    Returns:
        A Column expression representing the average aggregation

    Raises:
        TypeError: If column is not a Column or string
    """
    return Column._from_logical_expr(
        AvgExpr(Column._from_col_or_name(column)._logical_expr)
    )

coalesce

coalesce(*cols: ColumnOrName) -> Column

Returns the first non-null value from the given columns for each row.

This function mimics the behavior of SQL's COALESCE function. It evaluates the input columns in order and returns the first non-null value encountered. If all values are null, returns null.

Parameters:

  • *cols (ColumnOrName, default: () ) –

    Column expressions or column names to evaluate. Can be:

    • Individual arguments
    • Lists of columns/column names
    • Tuples of columns/column names

Returns:

  • Column

    A Column expression containing the first non-null value from the input columns.

Raises:

  • ValueError

    If no columns are provided.

Basic coalesce usage
# Basic usage
df.select(coalesce("col1", "col2", "col3"))

# With nested collections
df.select(coalesce(["col1", "col2"], "col3"))
Source code in src/fenic/api/functions/builtin.py
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
@validate_call(config=ConfigDict(strict=True, arbitrary_types_allowed=True))
def coalesce(*cols: ColumnOrName) -> Column:
    """Returns the first non-null value from the given columns for each row.

    This function mimics the behavior of SQL's COALESCE function. It evaluates the input columns
    in order and returns the first non-null value encountered. If all values are null, returns null.

    Args:
        *cols: Column expressions or column names to evaluate. Can be:

            - Individual arguments
            - Lists of columns/column names
            - Tuples of columns/column names

    Returns:
        A Column expression containing the first non-null value from the input columns.

    Raises:
        ValueError: If no columns are provided.

    Example: Basic coalesce usage
        ```python
        # Basic usage
        df.select(coalesce("col1", "col2", "col3"))

        # With nested collections
        df.select(coalesce(["col1", "col2"], "col3"))
        ```
    """
    if not cols:
        raise ValueError("At least one column must be provided to coalesce method")

    flattened_args = []
    for arg in cols:
        if isinstance(arg, (list, tuple)):
            flattened_args.extend(arg)
        else:
            flattened_args.append(arg)

    flattened_exprs = [
        Column._from_col_or_name(c)._logical_expr for c in flattened_args
    ]
    return Column._from_logical_expr(CoalesceExpr(flattened_exprs))

col

col(col_name: str) -> Column

Creates a Column expression referencing a column in the DataFrame.

Parameters:

  • col_name (str) –

    Name of the column to reference

Returns:

  • Column

    A Column expression for the specified column

Raises:

  • TypeError

    If colName is not a string

Source code in src/fenic/api/functions/core.py
16
17
18
19
20
21
22
23
24
25
26
27
28
29
@validate_call(config=ConfigDict(strict=True))
def col(col_name: str) -> Column:
    """Creates a Column expression referencing a column in the DataFrame.

    Args:
        col_name: Name of the column to reference

    Returns:
        A Column expression for the specified column

    Raises:
        TypeError: If colName is not a string
    """
    return Column._from_column_name(col_name)

collect_list

collect_list(column: ColumnOrName) -> Column

Aggregate function: collects all values from the specified column into a list.

Parameters:

  • column (ColumnOrName) –

    Column or column name to collect values from

Returns:

  • Column

    A Column expression representing the list aggregation

Raises:

  • TypeError

    If column is not a Column or string

Source code in src/fenic/api/functions/builtin.py
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
@validate_call(config=ConfigDict(strict=True, arbitrary_types_allowed=True))
def collect_list(column: ColumnOrName) -> Column:
    """Aggregate function: collects all values from the specified column into a list.

    Args:
        column: Column or column name to collect values from

    Returns:
        A Column expression representing the list aggregation

    Raises:
        TypeError: If column is not a Column or string
    """
    return Column._from_logical_expr(
        ListExpr(Column._from_col_or_name(column)._logical_expr)
    )

count

count(column: ColumnOrName) -> Column

Aggregate function: returns the count of non-null values in the specified column.

Parameters:

  • column (ColumnOrName) –

    Column or column name to count values in

Returns:

  • Column

    A Column expression representing the count aggregation

Raises:

  • TypeError

    If column is not a Column or string

Source code in src/fenic/api/functions/builtin.py
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
@validate_call(config=ConfigDict(strict=True, arbitrary_types_allowed=True))
def count(column: ColumnOrName) -> Column:
    """Aggregate function: returns the count of non-null values in the specified column.

    Args:
        column: Column or column name to count values in

    Returns:
        A Column expression representing the count aggregation

    Raises:
        TypeError: If column is not a Column or string
    """
    if isinstance(column, str) and column == "*":
        return Column._from_logical_expr(CountExpr(lit("*")._logical_expr))
    return Column._from_logical_expr(
        CountExpr(Column._from_col_or_name(column)._logical_expr)
    )

desc

desc(column: ColumnOrName) -> Column

Creates a Column expression representing a descending sort order.

Parameters:

  • column (ColumnOrName) –

    The column to apply the descending ordering to.

Returns:

  • Column

    A Column expression representing the column and the descending sort order.

Raises:

  • ValueError

    If the type of the column cannot be inferred.

  • Error

    If this expression is passed to a dataframe operation besides sort() and order_by().

Source code in src/fenic/api/functions/builtin.py
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
@validate_call(config=ConfigDict(strict=True, arbitrary_types_allowed=True))
def desc(column: ColumnOrName) -> Column:
    """Creates a Column expression representing a descending sort order.

    Args:
        column: The column to apply the descending ordering to.

    Returns:
        A Column expression representing the column and the descending sort order.

    Raises:
        ValueError: If the type of the column cannot be inferred.
        Error: If this expression is passed to a dataframe operation besides sort() and order_by().
    """
    return Column._from_col_or_name(column).desc()

desc_nulls_first

desc_nulls_first(column: ColumnOrName) -> Column

Creates a Column expression representing a descending sort order with nulls first.

Parameters:

  • column (ColumnOrName) –

    The column to apply the descending ordering to.

Returns:

  • Column

    A Column expression representing the column and the descending sort order with nulls first.

Raises:

  • ValueError

    If the type of the column cannot be inferred.

  • Error

    If this expression is passed to a dataframe operation besides sort() and order_by().

Source code in src/fenic/api/functions/builtin.py
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
@validate_call(config=ConfigDict(strict=True, arbitrary_types_allowed=True))
def desc_nulls_first(column: ColumnOrName) -> Column:
    """Creates a Column expression representing a descending sort order with nulls first.

    Args:
        column: The column to apply the descending ordering to.

    Returns:
        A Column expression representing the column and the descending sort order with nulls first.

    Raises:
        ValueError: If the type of the column cannot be inferred.
        Error: If this expression is passed to a dataframe operation besides sort() and order_by().
    """
    return Column._from_col_or_name(column).desc_nulls_first()

desc_nulls_last

desc_nulls_last(column: ColumnOrName) -> Column

Creates a Column expression representing a descending sort order with nulls last.

Parameters:

  • column (ColumnOrName) –

    The column to apply the descending ordering to.

Returns:

  • Column

    A Column expression representing the column and the descending sort order with nulls last.

Raises:

  • ValueError

    If the type of the column cannot be inferred.

  • Error

    If this expression is passed to a dataframe operation besides sort() and order_by().

Source code in src/fenic/api/functions/builtin.py
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
@validate_call(config=ConfigDict(strict=True, arbitrary_types_allowed=True))
def desc_nulls_last(column: ColumnOrName) -> Column:
    """Creates a Column expression representing a descending sort order with nulls last.

    Args:
        column: The column to apply the descending ordering to.

    Returns:
        A Column expression representing the column and the descending sort order with nulls last.

    Raises:
        ValueError: If the type of the column cannot be inferred.
        Error: If this expression is passed to a dataframe operation besides sort() and order_by().
    """
    return Column._from_col_or_name(column).desc_nulls_last()

lit

lit(value: Any) -> Column

Creates a Column expression representing a literal value.

Parameters:

  • value (Any) –

    The literal value to create a column for

Returns:

  • Column

    A Column expression representing the literal value

Raises:

  • ValueError

    If the type of the value cannot be inferred

Source code in src/fenic/api/functions/core.py
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
def lit(value: Any) -> Column:
    """Creates a Column expression representing a literal value.

    Args:
        value: The literal value to create a column for

    Returns:
        A Column expression representing the literal value

    Raises:
        ValueError: If the type of the value cannot be inferred
    """
    try:
        inferred_type = infer_dtype_from_pyobj(value)
    except TypeInferenceError as e:
        raise ValidationError(f"`lit` failed to infer type for value `{value}`") from e
    literal_expr = LiteralExpr(value, inferred_type)
    return Column._from_logical_expr(literal_expr)

max

max(column: ColumnOrName) -> Column

Aggregate function: returns the maximum value in the specified column.

Parameters:

  • column (ColumnOrName) –

    Column or column name to compute the maximum of

Returns:

  • Column

    A Column expression representing the maximum aggregation

Raises:

  • TypeError

    If column is not a Column or string

Source code in src/fenic/api/functions/builtin.py
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
@validate_call(config=ConfigDict(strict=True, arbitrary_types_allowed=True))
def max(column: ColumnOrName) -> Column:
    """Aggregate function: returns the maximum value in the specified column.

    Args:
        column: Column or column name to compute the maximum of

    Returns:
        A Column expression representing the maximum aggregation

    Raises:
        TypeError: If column is not a Column or string
    """
    return Column._from_logical_expr(
        MaxExpr(Column._from_col_or_name(column)._logical_expr)
    )

mean

mean(column: ColumnOrName) -> Column

Aggregate function: returns the mean (average) of all values in the specified column.

Alias for avg().

Parameters:

  • column (ColumnOrName) –

    Column or column name to compute the mean of

Returns:

  • Column

    A Column expression representing the mean aggregation

Raises:

  • TypeError

    If column is not a Column or string

Source code in src/fenic/api/functions/builtin.py
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
@validate_call(config=ConfigDict(strict=True, arbitrary_types_allowed=True))
def mean(column: ColumnOrName) -> Column:
    """Aggregate function: returns the mean (average) of all values in the specified column.

    Alias for avg().

    Args:
        column: Column or column name to compute the mean of

    Returns:
        A Column expression representing the mean aggregation

    Raises:
        TypeError: If column is not a Column or string
    """
    return Column._from_logical_expr(
        AvgExpr(Column._from_col_or_name(column)._logical_expr)
    )

min

min(column: ColumnOrName) -> Column

Aggregate function: returns the minimum value in the specified column.

Parameters:

  • column (ColumnOrName) –

    Column or column name to compute the minimum of

Returns:

  • Column

    A Column expression representing the minimum aggregation

Raises:

  • TypeError

    If column is not a Column or string

Source code in src/fenic/api/functions/builtin.py
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
@validate_call(config=ConfigDict(strict=True, arbitrary_types_allowed=True))
def min(column: ColumnOrName) -> Column:
    """Aggregate function: returns the minimum value in the specified column.

    Args:
        column: Column or column name to compute the minimum of

    Returns:
        A Column expression representing the minimum aggregation

    Raises:
        TypeError: If column is not a Column or string
    """
    return Column._from_logical_expr(
        MinExpr(Column._from_col_or_name(column)._logical_expr)
    )

struct

struct(*args: Union[ColumnOrName, List[ColumnOrName], Tuple[ColumnOrName, ...]]) -> Column

Creates a new struct column from multiple input columns.

Parameters:

  • *args (Union[ColumnOrName, List[ColumnOrName], Tuple[ColumnOrName, ...]], default: () ) –

    Columns or column names to combine into a struct. Can be:

    • Individual arguments
    • Lists of columns/column names
    • Tuples of columns/column names

Returns:

  • Column

    A Column expression representing a struct containing the input columns

Raises:

  • TypeError

    If any argument is not a Column, string, or collection of Columns/strings

Source code in src/fenic/api/functions/builtin.py
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
@validate_call(config=ConfigDict(strict=True, arbitrary_types_allowed=True))
def struct(
    *args: Union[ColumnOrName, List[ColumnOrName], Tuple[ColumnOrName, ...]]
) -> Column:
    """Creates a new struct column from multiple input columns.

    Args:
        *args: Columns or column names to combine into a struct. Can be:

            - Individual arguments
            - Lists of columns/column names
            - Tuples of columns/column names

    Returns:
        A Column expression representing a struct containing the input columns

    Raises:
        TypeError: If any argument is not a Column, string, or collection of
            Columns/strings
    """
    flattened_args = []
    for arg in args:
        if isinstance(arg, (list, tuple)):
            flattened_args.extend(arg)
        else:
            flattened_args.append(arg)

    expr_columns = [Column._from_col_or_name(c)._logical_expr for c in flattened_args]

    return Column._from_logical_expr(StructExpr(expr_columns))

sum

sum(column: ColumnOrName) -> Column

Aggregate function: returns the sum of all values in the specified column.

Parameters:

  • column (ColumnOrName) –

    Column or column name to compute the sum of

Returns:

  • Column

    A Column expression representing the sum aggregation

Raises:

  • TypeError

    If column is not a Column or string

Source code in src/fenic/api/functions/builtin.py
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
@validate_call(config=ConfigDict(strict=True, arbitrary_types_allowed=True))
def sum(column: ColumnOrName) -> Column:
    """Aggregate function: returns the sum of all values in the specified column.

    Args:
        column: Column or column name to compute the sum of

    Returns:
        A Column expression representing the sum aggregation

    Raises:
        TypeError: If column is not a Column or string
    """
    return Column._from_logical_expr(
        SumExpr(Column._from_col_or_name(column)._logical_expr)
    )

udf

udf(f: Optional[Callable] = None, *, return_type: DataType)

A decorator or function for creating user-defined functions (UDFs) that can be applied to DataFrame rows.

When applied, UDFs will: - Access StructType columns as Python dictionaries (dict[str, Any]). - Access ArrayType columns as Python lists (list[Any]). - Access primitive types (e.g., int, float, str) as their respective Python types.

Parameters:

  • f (Optional[Callable], default: None ) –

    Python function to convert to UDF

  • return_type (DataType) –

    Expected return type of the UDF. Required parameter.

UDF with primitive types
# UDF with primitive types
@udf(return_type=IntegerType)
def add_one(x: int):
    return x + 1

# Or
add_one = udf(lambda x: x + 1, return_type=IntegerType)
UDF with nested types
# UDF with nested types
@udf(return_type=StructType([StructField("value1", IntegerType), StructField("value2", IntegerType)]))
def example_udf(x: dict[str, int], y: list[int]):
    return {
        "value1": x["value1"] + x["value2"] + y[0],
        "value2": x["value1"] + x["value2"] + y[1],
    }
Source code in src/fenic/api/functions/builtin.py
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
@validate_call(config=ConfigDict(strict=True, arbitrary_types_allowed=True))
def udf(f: Optional[Callable] = None, *, return_type: DataType):
    """A decorator or function for creating user-defined functions (UDFs) that can be applied to DataFrame rows.

    When applied, UDFs will:
    - Access `StructType` columns as Python dictionaries (`dict[str, Any]`).
    - Access `ArrayType` columns as Python lists (`list[Any]`).
    - Access primitive types (e.g., `int`, `float`, `str`) as their respective Python types.

    Args:
        f: Python function to convert to UDF

        return_type: Expected return type of the UDF. Required parameter.

    Example: UDF with primitive types
        ```python
        # UDF with primitive types
        @udf(return_type=IntegerType)
        def add_one(x: int):
            return x + 1

        # Or
        add_one = udf(lambda x: x + 1, return_type=IntegerType)
        ```

    Example: UDF with nested types
        ```python
        # UDF with nested types
        @udf(return_type=StructType([StructField("value1", IntegerType), StructField("value2", IntegerType)]))
        def example_udf(x: dict[str, int], y: list[int]):
            return {
                "value1": x["value1"] + x["value2"] + y[0],
                "value2": x["value1"] + x["value2"] + y[1],
            }
        ```
    """

    def _create_udf(func: Callable) -> Callable:
        @wraps(func)
        def _udf_wrapper(*cols: ColumnOrName) -> Column:
            col_exprs = [Column._from_col_or_name(c)._logical_expr for c in cols]
            return Column._from_logical_expr(UDFExpr(func, col_exprs, return_type))

        return _udf_wrapper

    if f is not None:
        return _create_udf(f)
    return _create_udf

when

when(condition: Column, value: Column) -> Column

Evaluates a condition and returns a value if true.

This function is used to create conditional expressions. If Column.otherwise() is not invoked, None is returned for unmatched conditions.

Parameters:

  • condition (Column) –

    A boolean Column expression to evaluate.

  • value (Column) –

    A Column expression to return if the condition is true.

Returns:

  • Column

    A Column expression that evaluates the condition and returns the specified value when true,

  • Column

    and None otherwise.

Raises:

  • TypeError

    If the condition is not a boolean Column expression.

Basic conditional expression
# Basic usage
df.select(when(col("age") > 18, lit("adult")))

# With otherwise
df.select(when(col("age") > 18, lit("adult")).otherwise(lit("minor")))
Source code in src/fenic/api/functions/builtin.py
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
@validate_call(config=ConfigDict(strict=True, arbitrary_types_allowed=True))
def when(condition: Column, value: Column) -> Column:
    """Evaluates a condition and returns a value if true.

    This function is used to create conditional expressions. If Column.otherwise() is not invoked,
    None is returned for unmatched conditions.

    Args:
        condition: A boolean Column expression to evaluate.

        value: A Column expression to return if the condition is true.

    Returns:
        A Column expression that evaluates the condition and returns the specified value when true,
        and None otherwise.

    Raises:
        TypeError: If the condition is not a boolean Column expression.

    Example: Basic conditional expression
        ```python
        # Basic usage
        df.select(when(col("age") > 18, lit("adult")))

        # With otherwise
        df.select(when(col("age") > 18, lit("adult")).otherwise(lit("minor")))
        ```
    """
    return Column._from_logical_expr(
        WhenExpr(None, condition._logical_expr, value._logical_expr)
    )