Skip to content

fenic.api.functions.builtin

Built-in functions for Fenic DataFrames.

Functions:

  • array

    Creates a new array column from multiple input columns.

  • array_agg

    Alias for collect_list().

  • array_contains

    Checks if array column contains a specific value.

  • array_size

    Returns the number of elements in an array column.

  • asc

    Creates a Column expression representing an ascending sort order.

  • asc_nulls_first

    Creates a Column expression representing an ascending sort order with nulls first.

  • asc_nulls_last

    Creates a Column expression representing an ascending sort order with nulls last.

  • avg

    Aggregate function: returns the average (mean) of all values in the specified column.

  • coalesce

    Returns the first non-null value from the given columns for each row.

  • collect_list

    Aggregate function: collects all values from the specified column into a list.

  • count

    Aggregate function: returns the count of non-null values in the specified column.

  • desc

    Creates a Column expression representing a descending sort order.

  • desc_nulls_first

    Creates a Column expression representing a descending sort order with nulls first.

  • desc_nulls_last

    Creates a Column expression representing a descending sort order with nulls last.

  • first

    Aggregate function: returns the first non-null value in the specified column.

  • max

    Aggregate function: returns the maximum value in the specified column.

  • mean

    Aggregate function: returns the mean (average) of all values in the specified column.

  • min

    Aggregate function: returns the minimum value in the specified column.

  • stddev

    Aggregate function: returns the sample standard deviation of the specified column.

  • struct

    Creates a new struct column from multiple input columns.

  • sum

    Aggregate function: returns the sum of all values in the specified column.

  • udf

    A decorator or function for creating user-defined functions (UDFs) that can be applied to DataFrame rows.

  • when

    Evaluates a condition and returns a value if true.

array

array(*args: Union[ColumnOrName, List[ColumnOrName], Tuple[ColumnOrName, ...]]) -> Column

Creates a new array column from multiple input columns.

Parameters:

  • *args (Union[ColumnOrName, List[ColumnOrName], Tuple[ColumnOrName, ...]], default: () ) –

    Columns or column names to combine into an array. Can be:

    • Individual arguments
    • Lists of columns/column names
    • Tuples of columns/column names

Returns:

  • Column

    A Column expression representing an array containing values from the input columns

Raises:

  • TypeError

    If any argument is not a Column, string, or collection of Columns/strings

Source code in src/fenic/api/functions/builtin.py
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
@validate_call(config=ConfigDict(strict=True, arbitrary_types_allowed=True))
def array(
    *args: Union[ColumnOrName, List[ColumnOrName], Tuple[ColumnOrName, ...]]
) -> Column:
    """Creates a new array column from multiple input columns.

    Args:
        *args: Columns or column names to combine into an array. Can be:

            - Individual arguments
            - Lists of columns/column names
            - Tuples of columns/column names

    Returns:
        A Column expression representing an array containing values from the input columns

    Raises:
        TypeError: If any argument is not a Column, string, or collection of
            Columns/strings
    """
    flattened_args = []
    for arg in args:
        if isinstance(arg, (list, tuple)):
            flattened_args.extend(arg)
        else:
            flattened_args.append(arg)

    expr_columns = [Column._from_col_or_name(c)._logical_expr for c in flattened_args]

    return Column._from_logical_expr(ArrayExpr(expr_columns))

array_agg

array_agg(column: ColumnOrName) -> Column

Alias for collect_list().

Source code in src/fenic/api/functions/builtin.py
161
162
163
164
@validate_call(config=ConfigDict(strict=True, arbitrary_types_allowed=True))
def array_agg(column: ColumnOrName) -> Column:
    """Alias for collect_list()."""
    return collect_list(column)

array_contains

array_contains(column: ColumnOrName, value: Union[str, int, float, bool, Column]) -> Column

Checks if array column contains a specific value.

This function returns True if the array in the specified column contains the given value, and False otherwise. Returns False if the array is None.

Parameters:

  • column (ColumnOrName) –

    Column or column name containing the arrays to check.

  • value (Union[str, int, float, bool, Column]) –

    Value to search for in the arrays. Can be: - A literal value (string, number, boolean) - A Column expression

Returns:

  • Column

    A boolean Column expression (True if value is found, False otherwise).

Raises:

  • TypeError

    If value type is incompatible with the array element type.

  • TypeError

    If the column does not contain array data.

Check for values in arrays
# Check if 'python' exists in arrays in the 'tags' column
df.select(array_contains("tags", "python"))

# Check using a value from another column
df.select(array_contains("tags", col("search_term")))
Source code in src/fenic/api/functions/builtin.py
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
@validate_call(config=ConfigDict(strict=True, arbitrary_types_allowed=True))
def array_contains(
    column: ColumnOrName, value: Union[str, int, float, bool, Column]
) -> Column:
    """Checks if array column contains a specific value.

    This function returns True if the array in the specified column contains the given value,
    and False otherwise. Returns False if the array is None.

    Args:
        column: Column or column name containing the arrays to check.

        value: Value to search for in the arrays. Can be:
            - A literal value (string, number, boolean)
            - A Column expression

    Returns:
        A boolean Column expression (True if value is found, False otherwise).

    Raises:
        TypeError: If value type is incompatible with the array element type.
        TypeError: If the column does not contain array data.

    Example: Check for values in arrays
        ```python
        # Check if 'python' exists in arrays in the 'tags' column
        df.select(array_contains("tags", "python"))

        # Check using a value from another column
        df.select(array_contains("tags", col("search_term")))
        ```
    """
    value_column = None
    if isinstance(value, Column):
        value_column = value
    else:
        value_column = lit(value)
    return Column._from_logical_expr(
        ArrayContainsExpr(
            Column._from_col_or_name(column)._logical_expr, value_column._logical_expr
        )
    )

array_size

array_size(column: ColumnOrName) -> Column

Returns the number of elements in an array column.

This function computes the length of arrays stored in the specified column. Returns None for None arrays.

Parameters:

  • column (ColumnOrName) –

    Column or column name containing arrays whose length to compute.

Returns:

  • Column

    A Column expression representing the array length.

Raises:

  • TypeError

    If the column does not contain array data.

Get array sizes
# Get the size of arrays in 'tags' column
df.select(array_size("tags"))

# Use with column reference
df.select(array_size(col("tags")))
Source code in src/fenic/api/functions/builtin.py
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
@validate_call(config=ConfigDict(strict=True, arbitrary_types_allowed=True))
def array_size(column: ColumnOrName) -> Column:
    """Returns the number of elements in an array column.

    This function computes the length of arrays stored in the specified column.
    Returns None for None arrays.

    Args:
        column: Column or column name containing arrays whose length to compute.

    Returns:
        A Column expression representing the array length.

    Raises:
        TypeError: If the column does not contain array data.

    Example: Get array sizes
        ```python
        # Get the size of arrays in 'tags' column
        df.select(array_size("tags"))

        # Use with column reference
        df.select(array_size(col("tags")))
        ```
    """
    return Column._from_logical_expr(
        ArrayLengthExpr(Column._from_col_or_name(column)._logical_expr)
    )

asc

asc(column: ColumnOrName) -> Column

Creates a Column expression representing an ascending sort order.

Parameters:

  • column (ColumnOrName) –

    The column to apply the ascending ordering to.

Returns:

  • Column

    A Column expression representing the column and the ascending sort order.

Raises:

  • ValueError

    If the type of the column cannot be inferred.

  • Error

    If this expression is passed to a dataframe operation besides sort() and order_by().

Source code in src/fenic/api/functions/builtin.py
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
@validate_call(config=ConfigDict(strict=True, arbitrary_types_allowed=True))
def asc(column: ColumnOrName) -> Column:
    """Creates a Column expression representing an ascending sort order.

    Args:
        column: The column to apply the ascending ordering to.

    Returns:
        A Column expression representing the column and the ascending sort order.

    Raises:
        ValueError: If the type of the column cannot be inferred.
        Error: If this expression is passed to a dataframe operation besides sort() and order_by().
    """
    return Column._from_col_or_name(column).asc()

asc_nulls_first

asc_nulls_first(column: ColumnOrName) -> Column

Creates a Column expression representing an ascending sort order with nulls first.

Parameters:

  • column (ColumnOrName) –

    The column to apply the ascending ordering to.

Returns:

  • Column

    A Column expression representing the column and the ascending sort order with nulls first.

Raises:

  • ValueError

    If the type of the column cannot be inferred.

  • Error

    If this expression is passed to a dataframe operation besides sort() and order_by().

Source code in src/fenic/api/functions/builtin.py
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
@validate_call(config=ConfigDict(strict=True, arbitrary_types_allowed=True))
def asc_nulls_first(column: ColumnOrName) -> Column:
    """Creates a Column expression representing an ascending sort order with nulls first.

    Args:
        column: The column to apply the ascending ordering to.

    Returns:
        A Column expression representing the column and the ascending sort order with nulls first.

    Raises:
        ValueError: If the type of the column cannot be inferred.
        Error: If this expression is passed to a dataframe operation besides sort() and order_by().
    """
    return Column._from_col_or_name(column).asc_nulls_first()

asc_nulls_last

asc_nulls_last(column: ColumnOrName) -> Column

Creates a Column expression representing an ascending sort order with nulls last.

Parameters:

  • column (ColumnOrName) –

    The column to apply the ascending ordering to.

Returns:

  • Column

    A Column expression representing the column and the ascending sort order with nulls last.

Raises:

  • ValueError

    If the type of the column cannot be inferred.

  • Error

    If this expression is passed to a dataframe operation besides sort() and order_by().

Source code in src/fenic/api/functions/builtin.py
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
@validate_call(config=ConfigDict(strict=True, arbitrary_types_allowed=True))
def asc_nulls_last(column: ColumnOrName) -> Column:
    """Creates a Column expression representing an ascending sort order with nulls last.

    Args:
        column: The column to apply the ascending ordering to.

    Returns:
        A Column expression representing the column and the ascending sort order with nulls last.

    Raises:
        ValueError: If the type of the column cannot be inferred.
        Error: If this expression is passed to a dataframe operation besides sort() and order_by().
    """
    return Column._from_col_or_name(column).asc_nulls_last()

avg

avg(column: ColumnOrName) -> Column

Aggregate function: returns the average (mean) of all values in the specified column.

Parameters:

  • column (ColumnOrName) –

    Column or column name to compute the average of

Returns:

  • Column

    A Column expression representing the average aggregation

Raises:

  • TypeError

    If column is not a Column or string

Source code in src/fenic/api/functions/builtin.py
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
@validate_call(config=ConfigDict(strict=True, arbitrary_types_allowed=True))
def avg(column: ColumnOrName) -> Column:
    """Aggregate function: returns the average (mean) of all values in the specified column.

    Args:
        column: Column or column name to compute the average of

    Returns:
        A Column expression representing the average aggregation

    Raises:
        TypeError: If column is not a Column or string
    """
    return Column._from_logical_expr(
        AvgExpr(Column._from_col_or_name(column)._logical_expr)
    )

coalesce

coalesce(*cols: ColumnOrName) -> Column

Returns the first non-null value from the given columns for each row.

This function mimics the behavior of SQL's COALESCE function. It evaluates the input columns in order and returns the first non-null value encountered. If all values are null, returns null.

Parameters:

  • *cols (ColumnOrName, default: () ) –

    Column expressions or column names to evaluate. Can be:

    • Individual arguments
    • Lists of columns/column names
    • Tuples of columns/column names

Returns:

  • Column

    A Column expression containing the first non-null value from the input columns.

Raises:

  • ValueError

    If no columns are provided.

Basic coalesce usage
# Basic usage
df.select(coalesce("col1", "col2", "col3"))

# With nested collections
df.select(coalesce(["col1", "col2"], "col3"))
Source code in src/fenic/api/functions/builtin.py
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
@validate_call(config=ConfigDict(strict=True, arbitrary_types_allowed=True))
def coalesce(*cols: ColumnOrName) -> Column:
    """Returns the first non-null value from the given columns for each row.

    This function mimics the behavior of SQL's COALESCE function. It evaluates the input columns
    in order and returns the first non-null value encountered. If all values are null, returns null.

    Args:
        *cols: Column expressions or column names to evaluate. Can be:

            - Individual arguments
            - Lists of columns/column names
            - Tuples of columns/column names

    Returns:
        A Column expression containing the first non-null value from the input columns.

    Raises:
        ValueError: If no columns are provided.

    Example: Basic coalesce usage
        ```python
        # Basic usage
        df.select(coalesce("col1", "col2", "col3"))

        # With nested collections
        df.select(coalesce(["col1", "col2"], "col3"))
        ```
    """
    if not cols:
        raise ValueError("At least one column must be provided to coalesce method")

    flattened_args = []
    for arg in cols:
        if isinstance(arg, (list, tuple)):
            flattened_args.extend(arg)
        else:
            flattened_args.append(arg)

    flattened_exprs = [
        Column._from_col_or_name(c)._logical_expr for c in flattened_args
    ]
    return Column._from_logical_expr(CoalesceExpr(flattened_exprs))

collect_list

collect_list(column: ColumnOrName) -> Column

Aggregate function: collects all values from the specified column into a list.

Parameters:

  • column (ColumnOrName) –

    Column or column name to collect values from

Returns:

  • Column

    A Column expression representing the list aggregation

Raises:

  • TypeError

    If column is not a Column or string

Source code in src/fenic/api/functions/builtin.py
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
@validate_call(config=ConfigDict(strict=True, arbitrary_types_allowed=True))
def collect_list(column: ColumnOrName) -> Column:
    """Aggregate function: collects all values from the specified column into a list.

    Args:
        column: Column or column name to collect values from

    Returns:
        A Column expression representing the list aggregation

    Raises:
        TypeError: If column is not a Column or string
    """
    return Column._from_logical_expr(
        ListExpr(Column._from_col_or_name(column)._logical_expr)
    )

count

count(column: ColumnOrName) -> Column

Aggregate function: returns the count of non-null values in the specified column.

Parameters:

  • column (ColumnOrName) –

    Column or column name to count values in

Returns:

  • Column

    A Column expression representing the count aggregation

Raises:

  • TypeError

    If column is not a Column or string

Source code in src/fenic/api/functions/builtin.py
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
@validate_call(config=ConfigDict(strict=True, arbitrary_types_allowed=True))
def count(column: ColumnOrName) -> Column:
    """Aggregate function: returns the count of non-null values in the specified column.

    Args:
        column: Column or column name to count values in

    Returns:
        A Column expression representing the count aggregation

    Raises:
        TypeError: If column is not a Column or string
    """
    if isinstance(column, str) and column == "*":
        return Column._from_logical_expr(CountExpr(lit("*")._logical_expr))
    return Column._from_logical_expr(
        CountExpr(Column._from_col_or_name(column)._logical_expr)
    )

desc

desc(column: ColumnOrName) -> Column

Creates a Column expression representing a descending sort order.

Parameters:

  • column (ColumnOrName) –

    The column to apply the descending ordering to.

Returns:

  • Column

    A Column expression representing the column and the descending sort order.

Raises:

  • ValueError

    If the type of the column cannot be inferred.

  • Error

    If this expression is passed to a dataframe operation besides sort() and order_by().

Source code in src/fenic/api/functions/builtin.py
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
@validate_call(config=ConfigDict(strict=True, arbitrary_types_allowed=True))
def desc(column: ColumnOrName) -> Column:
    """Creates a Column expression representing a descending sort order.

    Args:
        column: The column to apply the descending ordering to.

    Returns:
        A Column expression representing the column and the descending sort order.

    Raises:
        ValueError: If the type of the column cannot be inferred.
        Error: If this expression is passed to a dataframe operation besides sort() and order_by().
    """
    return Column._from_col_or_name(column).desc()

desc_nulls_first

desc_nulls_first(column: ColumnOrName) -> Column

Creates a Column expression representing a descending sort order with nulls first.

Parameters:

  • column (ColumnOrName) –

    The column to apply the descending ordering to.

Returns:

  • Column

    A Column expression representing the column and the descending sort order with nulls first.

Raises:

  • ValueError

    If the type of the column cannot be inferred.

  • Error

    If this expression is passed to a dataframe operation besides sort() and order_by().

Source code in src/fenic/api/functions/builtin.py
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
@validate_call(config=ConfigDict(strict=True, arbitrary_types_allowed=True))
def desc_nulls_first(column: ColumnOrName) -> Column:
    """Creates a Column expression representing a descending sort order with nulls first.

    Args:
        column: The column to apply the descending ordering to.

    Returns:
        A Column expression representing the column and the descending sort order with nulls first.

    Raises:
        ValueError: If the type of the column cannot be inferred.
        Error: If this expression is passed to a dataframe operation besides sort() and order_by().
    """
    return Column._from_col_or_name(column).desc_nulls_first()

desc_nulls_last

desc_nulls_last(column: ColumnOrName) -> Column

Creates a Column expression representing a descending sort order with nulls last.

Parameters:

  • column (ColumnOrName) –

    The column to apply the descending ordering to.

Returns:

  • Column

    A Column expression representing the column and the descending sort order with nulls last.

Raises:

  • ValueError

    If the type of the column cannot be inferred.

  • Error

    If this expression is passed to a dataframe operation besides sort() and order_by().

Source code in src/fenic/api/functions/builtin.py
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
@validate_call(config=ConfigDict(strict=True, arbitrary_types_allowed=True))
def desc_nulls_last(column: ColumnOrName) -> Column:
    """Creates a Column expression representing a descending sort order with nulls last.

    Args:
        column: The column to apply the descending ordering to.

    Returns:
        A Column expression representing the column and the descending sort order with nulls last.

    Raises:
        ValueError: If the type of the column cannot be inferred.
        Error: If this expression is passed to a dataframe operation besides sort() and order_by().
    """
    return Column._from_col_or_name(column).desc_nulls_last()

first

first(column: ColumnOrName) -> Column

Aggregate function: returns the first non-null value in the specified column.

Typically used in aggregations to select the first observed value per group.

Parameters:

  • column (ColumnOrName) –

    Column or column name.

Returns:

  • Column

    Column expression for the first value.

Source code in src/fenic/api/functions/builtin.py
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
@validate_call(config=ConfigDict(strict=True, arbitrary_types_allowed=True))
def first(column: ColumnOrName) -> Column:
    """Aggregate function: returns the first non-null value in the specified column.

    Typically used in aggregations to select the first observed value per group.

    Args:
        column: Column or column name.

    Returns:
        Column expression for the first value.
    """
    return Column._from_logical_expr(
        FirstExpr(Column._from_col_or_name(column)._logical_expr)
    )

max

max(column: ColumnOrName) -> Column

Aggregate function: returns the maximum value in the specified column.

Parameters:

  • column (ColumnOrName) –

    Column or column name to compute the maximum of

Returns:

  • Column

    A Column expression representing the maximum aggregation

Raises:

  • TypeError

    If column is not a Column or string

Source code in src/fenic/api/functions/builtin.py
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
@validate_call(config=ConfigDict(strict=True, arbitrary_types_allowed=True))
def max(column: ColumnOrName) -> Column:
    """Aggregate function: returns the maximum value in the specified column.

    Args:
        column: Column or column name to compute the maximum of

    Returns:
        A Column expression representing the maximum aggregation

    Raises:
        TypeError: If column is not a Column or string
    """
    return Column._from_logical_expr(
        MaxExpr(Column._from_col_or_name(column)._logical_expr)
    )

mean

mean(column: ColumnOrName) -> Column

Aggregate function: returns the mean (average) of all values in the specified column.

Alias for avg().

Parameters:

  • column (ColumnOrName) –

    Column or column name to compute the mean of

Returns:

  • Column

    A Column expression representing the mean aggregation

Raises:

  • TypeError

    If column is not a Column or string

Source code in src/fenic/api/functions/builtin.py
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
@validate_call(config=ConfigDict(strict=True, arbitrary_types_allowed=True))
def mean(column: ColumnOrName) -> Column:
    """Aggregate function: returns the mean (average) of all values in the specified column.

    Alias for avg().

    Args:
        column: Column or column name to compute the mean of

    Returns:
        A Column expression representing the mean aggregation

    Raises:
        TypeError: If column is not a Column or string
    """
    return Column._from_logical_expr(
        AvgExpr(Column._from_col_or_name(column)._logical_expr)
    )

min

min(column: ColumnOrName) -> Column

Aggregate function: returns the minimum value in the specified column.

Parameters:

  • column (ColumnOrName) –

    Column or column name to compute the minimum of

Returns:

  • Column

    A Column expression representing the minimum aggregation

Raises:

  • TypeError

    If column is not a Column or string

Source code in src/fenic/api/functions/builtin.py
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
@validate_call(config=ConfigDict(strict=True, arbitrary_types_allowed=True))
def min(column: ColumnOrName) -> Column:
    """Aggregate function: returns the minimum value in the specified column.

    Args:
        column: Column or column name to compute the minimum of

    Returns:
        A Column expression representing the minimum aggregation

    Raises:
        TypeError: If column is not a Column or string
    """
    return Column._from_logical_expr(
        MinExpr(Column._from_col_or_name(column)._logical_expr)
    )

stddev

stddev(column: ColumnOrName) -> Column

Aggregate function: returns the sample standard deviation of the specified column.

Parameters:

  • column (ColumnOrName) –

    Column or column name.

Returns:

  • Column

    Column expression for sample standard deviation.

Source code in src/fenic/api/functions/builtin.py
182
183
184
185
186
187
188
189
190
191
192
193
194
@validate_call(config=ConfigDict(strict=True, arbitrary_types_allowed=True))
def stddev(column: ColumnOrName) -> Column:
    """Aggregate function: returns the sample standard deviation of the specified column.

    Args:
        column: Column or column name.

    Returns:
        Column expression for sample standard deviation.
    """
    return Column._from_logical_expr(
        StdDevExpr(Column._from_col_or_name(column)._logical_expr)
    )

struct

struct(*args: Union[ColumnOrName, List[ColumnOrName], Tuple[ColumnOrName, ...]]) -> Column

Creates a new struct column from multiple input columns.

Parameters:

  • *args (Union[ColumnOrName, List[ColumnOrName], Tuple[ColumnOrName, ...]], default: () ) –

    Columns or column names to combine into a struct. Can be:

    • Individual arguments
    • Lists of columns/column names
    • Tuples of columns/column names

Returns:

  • Column

    A Column expression representing a struct containing the input columns

Raises:

  • TypeError

    If any argument is not a Column, string, or collection of Columns/strings

Source code in src/fenic/api/functions/builtin.py
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
@validate_call(config=ConfigDict(strict=True, arbitrary_types_allowed=True))
def struct(
    *args: Union[ColumnOrName, List[ColumnOrName], Tuple[ColumnOrName, ...]]
) -> Column:
    """Creates a new struct column from multiple input columns.

    Args:
        *args: Columns or column names to combine into a struct. Can be:

            - Individual arguments
            - Lists of columns/column names
            - Tuples of columns/column names

    Returns:
        A Column expression representing a struct containing the input columns

    Raises:
        TypeError: If any argument is not a Column, string, or collection of
            Columns/strings
    """
    flattened_args = []
    for arg in args:
        if isinstance(arg, (list, tuple)):
            flattened_args.extend(arg)
        else:
            flattened_args.append(arg)

    expr_columns = [Column._from_col_or_name(c)._logical_expr for c in flattened_args]

    return Column._from_logical_expr(StructExpr(expr_columns))

sum

sum(column: ColumnOrName) -> Column

Aggregate function: returns the sum of all values in the specified column.

Parameters:

  • column (ColumnOrName) –

    Column or column name to compute the sum of

Returns:

  • Column

    A Column expression representing the sum aggregation

Raises:

  • TypeError

    If column is not a Column or string

Source code in src/fenic/api/functions/builtin.py
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
@validate_call(config=ConfigDict(strict=True, arbitrary_types_allowed=True))
def sum(column: ColumnOrName) -> Column:
    """Aggregate function: returns the sum of all values in the specified column.

    Args:
        column: Column or column name to compute the sum of

    Returns:
        A Column expression representing the sum aggregation

    Raises:
        TypeError: If column is not a Column or string
    """
    return Column._from_logical_expr(
        SumExpr(Column._from_col_or_name(column)._logical_expr)
    )

udf

udf(f: Optional[Callable] = None, *, return_type: DataType)

A decorator or function for creating user-defined functions (UDFs) that can be applied to DataFrame rows.

When applied, UDFs will: - Access StructType columns as Python dictionaries (dict[str, Any]). - Access ArrayType columns as Python lists (list[Any]). - Access primitive types (e.g., int, float, str) as their respective Python types.

Parameters:

  • f (Optional[Callable], default: None ) –

    Python function to convert to UDF

  • return_type (DataType) –

    Expected return type of the UDF. Required parameter.

UDF with primitive types
# UDF with primitive types
@udf(return_type=IntegerType)
def add_one(x: int):
    return x + 1

# Or
add_one = udf(lambda x: x + 1, return_type=IntegerType)
UDF with nested types
# UDF with nested types
@udf(return_type=StructType([StructField("value1", IntegerType), StructField("value2", IntegerType)]))
def example_udf(x: dict[str, int], y: list[int]):
    return {
        "value1": x["value1"] + x["value2"] + y[0],
        "value2": x["value1"] + x["value2"] + y[1],
    }
Source code in src/fenic/api/functions/builtin.py
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
@validate_call(config=ConfigDict(strict=True, arbitrary_types_allowed=True))
def udf(f: Optional[Callable] = None, *, return_type: DataType):
    """A decorator or function for creating user-defined functions (UDFs) that can be applied to DataFrame rows.

    When applied, UDFs will:
    - Access `StructType` columns as Python dictionaries (`dict[str, Any]`).
    - Access `ArrayType` columns as Python lists (`list[Any]`).
    - Access primitive types (e.g., `int`, `float`, `str`) as their respective Python types.

    Args:
        f: Python function to convert to UDF

        return_type: Expected return type of the UDF. Required parameter.

    Example: UDF with primitive types
        ```python
        # UDF with primitive types
        @udf(return_type=IntegerType)
        def add_one(x: int):
            return x + 1

        # Or
        add_one = udf(lambda x: x + 1, return_type=IntegerType)
        ```

    Example: UDF with nested types
        ```python
        # UDF with nested types
        @udf(return_type=StructType([StructField("value1", IntegerType), StructField("value2", IntegerType)]))
        def example_udf(x: dict[str, int], y: list[int]):
            return {
                "value1": x["value1"] + x["value2"] + y[0],
                "value2": x["value1"] + x["value2"] + y[1],
            }
        ```
    """

    def _create_udf(func: Callable) -> Callable:
        @wraps(func)
        def _udf_wrapper(*cols: ColumnOrName) -> Column:
            col_exprs = [Column._from_col_or_name(c)._logical_expr for c in cols]
            return Column._from_logical_expr(UDFExpr(func, col_exprs, return_type))

        return _udf_wrapper

    if f is not None:
        return _create_udf(f)
    return _create_udf

when

when(condition: Column, value: Column) -> Column

Evaluates a condition and returns a value if true.

This function is used to create conditional expressions. If Column.otherwise() is not invoked, None is returned for unmatched conditions.

Parameters:

  • condition (Column) –

    A boolean Column expression to evaluate.

  • value (Column) –

    A Column expression to return if the condition is true.

Returns:

  • Column

    A Column expression that evaluates the condition and returns the specified value when true,

  • Column

    and None otherwise.

Raises:

  • TypeError

    If the condition is not a boolean Column expression.

Basic conditional expression
# Basic usage
df.select(when(col("age") > 18, lit("adult")))

# With otherwise
df.select(when(col("age") > 18, lit("adult")).otherwise(lit("minor")))
Source code in src/fenic/api/functions/builtin.py
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
@validate_call(config=ConfigDict(strict=True, arbitrary_types_allowed=True))
def when(condition: Column, value: Column) -> Column:
    """Evaluates a condition and returns a value if true.

    This function is used to create conditional expressions. If Column.otherwise() is not invoked,
    None is returned for unmatched conditions.

    Args:
        condition: A boolean Column expression to evaluate.

        value: A Column expression to return if the condition is true.

    Returns:
        A Column expression that evaluates the condition and returns the specified value when true,
        and None otherwise.

    Raises:
        TypeError: If the condition is not a boolean Column expression.

    Example: Basic conditional expression
        ```python
        # Basic usage
        df.select(when(col("age") > 18, lit("adult")))

        # With otherwise
        df.select(when(col("age") > 18, lit("adult")).otherwise(lit("minor")))
        ```
    """
    return Column._from_logical_expr(
        WhenExpr(None, condition._logical_expr, value._logical_expr)
    )