Skip to content

fenic.core.types

Schema module for defining and manipulating DataFrame schemas.

Classes:

  • ArrayType

    A type representing a homogeneous variable-length array (list) of elements.

  • ClassDefinition

    Definition of a classification class with optional description.

  • ClassifyExample

    A single semantic example for classification operations.

  • ClassifyExampleCollection

    Collection of text-to-category examples for classification operations.

  • ColumnField

    Represents a typed column in a DataFrame schema.

  • DataType

    Base class for all data types.

  • DocumentPathType

    Represents a string containing a a document's local (file system) or remote (URL) path.

  • EmbeddingType

    A type representing a fixed-length embedding vector.

  • JoinExample

    A single semantic example for semantic join operations.

  • JoinExampleCollection

    Collection of comparison examples for semantic join operations.

  • KeyPoints

    Summary as a concise bulleted list.

  • MapExample

    A single semantic example for semantic mapping operations.

  • MapExampleCollection

    Collection of input-output examples for semantic map operations.

  • Paragraph

    Summary as a cohesive narrative.

  • PredicateExample

    A single semantic example for semantic predicate operations.

  • PredicateExampleCollection

    Collection of input-to-boolean examples for predicate operations.

  • QueryResult

    Container for query execution results and associated metadata.

  • Schema

    Represents the schema of a DataFrame.

  • StructField

    A field in a StructType. Fields are nullable.

  • StructType

    A type representing a struct (record) with named fields.

  • TranscriptType

    Represents a string containing a transcript in a specific format.

Attributes:

  • BooleanType

    Represents a boolean value. (True/False)

  • BranchSide

    Type alias representing the side of a branch in a lineage graph.

  • DataLike

    Union type representing any supported data format for both input and output operations.

  • DataLikeType

    String literal type for specifying data output formats.

  • DoubleType

    Represents a 64-bit floating-point number.

  • FloatType

    Represents a 32-bit floating-point number.

  • FuzzySimilarityMethod

    Type alias representing the supported fuzzy string similarity algorithms.

  • HtmlType

    Represents a string containing raw HTML markup.

  • IntegerType

    Represents a signed integer value.

  • JsonType

    Represents a string containing JSON data.

  • MarkdownType

    Represents a string containing Markdown-formatted text.

  • SemanticSimilarityMetric

    Type alias representing supported semantic similarity metrics.

  • StringType

    Represents a UTF-8 encoded string value.

BooleanType module-attribute

BooleanType = _BooleanType()

Represents a boolean value. (True/False)

BranchSide module-attribute

BranchSide = Literal['left', 'right']

Type alias representing the side of a branch in a lineage graph.

Valid values:

  • "left": The left branch of a join.
  • "right": The right branch of a join.

DataLike module-attribute

DataLike = Union[DataFrame, DataFrame, Dict[str, List[Any]], List[Dict[str, Any]], Table]

Union type representing any supported data format for both input and output operations.

This type encompasses all possible data structures that can be: 1. Used as input when creating DataFrames 2. Returned as output from query results

Supported formats
  • pl.DataFrame: Native Polars DataFrame with efficient columnar storage
  • pd.DataFrame: Pandas DataFrame, optionally with PyArrow extension arrays
  • Dict[str, List[Any]]: Column-oriented dictionary where:
    • Keys are column names (str)
    • Values are lists containing all values for that column
  • List[Dict[str, Any]]: Row-oriented list where:
    • Each element is a dictionary representing one row
    • Dictionary keys are column names, values are cell values
  • pa.Table: Apache Arrow Table with columnar memory layout
Usage
  • Input: Used in create_dataframe() to accept data in various formats
  • Output: Used in QueryResult.data to return results in requested format

The specific type returned depends on the DataLikeType format specified when collecting query results.

DataLikeType module-attribute

DataLikeType = Literal['polars', 'pandas', 'pydict', 'pylist', 'arrow']

String literal type for specifying data output formats.

Valid values
  • "polars": Native Polars DataFrame format
  • "pandas": Pandas DataFrame with PyArrow extension arrays
  • "pydict": Python dictionary with column names as keys, lists as values
  • "pylist": Python list of dictionaries, each representing one row
  • "arrow": Apache Arrow Table format

Used as input parameter for methods that can return data in multiple formats.

DoubleType module-attribute

DoubleType = _DoubleType()

Represents a 64-bit floating-point number.

FloatType module-attribute

FloatType = _FloatType()

Represents a 32-bit floating-point number.

FuzzySimilarityMethod module-attribute

FuzzySimilarityMethod = Literal['indel', 'levenshtein', 'damerau_levenshtein', 'jaro_winkler', 'jaro', 'hamming']

Type alias representing the supported fuzzy string similarity algorithms.

These algorithms quantify the similarity or difference between two strings using various distance or similarity metrics:

  • "indel": Computes the Indel (Insertion-Deletion) distance, which counts only insertions and deletions needed to transform one string into another, excluding substitutions. This is equivalent to the Longest Common Subsequence (LCS) problem. Useful when character substitutions should not be considered as valid operations (e.g., DNA sequence alignment where only insertions/deletions occur).
  • "levenshtein": Computes the Levenshtein distance, which is the minimum number of single-character edits (insertions, deletions, or substitutions) required to transform one string into another. Suitable for general-purpose fuzzy matching where transpositions do not matter.
  • "damerau_levenshtein": An extension of Levenshtein distance that also accounts for transpositions of adjacent characters (e.g., "ab" → "ba"). This metric is more accurate for real-world typos and keyboard errors.
  • "jaro": Measures similarity based on the number and order of common characters between two strings. It is particularly effective for short strings such as names. Returns a normalized score between 0 (no similarity) and 1 (exact match).
  • "jaro_winkler": A variant of the Jaro distance that gives more weight to common prefixes. Designed to improve accuracy on strings with shared beginnings (e.g., first names, surnames).
  • "hamming": Measures the number of differing characters between two strings of equal length. Only valid when both strings are the same length. It does not support insertions or deletions—only substitutions.

Choose the method based on the type of expected variation (e.g., typos, transpositions, or structural changes).

HtmlType module-attribute

HtmlType = _HtmlType()

Represents a string containing raw HTML markup.

IntegerType module-attribute

IntegerType = _IntegerType()

Represents a signed integer value.

JsonType module-attribute

JsonType = _JsonType()

Represents a string containing JSON data.

MarkdownType module-attribute

MarkdownType = _MarkdownType()

Represents a string containing Markdown-formatted text.

SemanticSimilarityMetric module-attribute

SemanticSimilarityMetric = Literal['cosine', 'l2', 'dot']

Type alias representing supported semantic similarity metrics.

Valid values:

  • "cosine": Cosine similarity, measures the cosine of the angle between two vectors.
  • "l2": Euclidean (L2) distance, measures the straight-line distance between two vectors.
  • "dot": Dot product similarity, the raw inner product of two vectors.

These metrics are commonly used for comparing embedding vectors in semantic search and other similarity-based applications.

StringType module-attribute

StringType = _StringType()

Represents a UTF-8 encoded string value.

ArrayType

Bases: DataType

A type representing a homogeneous variable-length array (list) of elements.

Attributes:

  • element_type (DataType) –

    The data type of each element in the array.

Create an array of strings
ArrayType(StringType)
ArrayType(element_type=StringType)

ClassDefinition

Bases: BaseModel

Definition of a classification class with optional description.

Used to define the available classes for semantic classification operations. The description helps the LLM understand what each class represents.

ClassifyExample

Bases: BaseModel

A single semantic example for classification operations.

Classify examples demonstrate the classification of an input string into a specific category string, used in a semantic.classify operation.

ClassifyExampleCollection

ClassifyExampleCollection(examples: List[ExampleType] = None)

Bases: BaseExampleCollection[ClassifyExample]

Collection of text-to-category examples for classification operations.

Stores examples showing which category each input text should be assigned to. Each example contains an input string and its corresponding category label.

Methods:

  • from_polars

    Create collection from a Polars DataFrame. Must have an 'output' column and an 'input' column.

Source code in src/fenic/core/types/semantic_examples.py
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
def __init__(self, examples: List[ExampleType] = None):
    """Initialize a collection of semantic examples.

    Args:
        examples: Optional list of examples to add to the collection. Each example
            will be processed through create_example() to ensure proper formatting
            and validation.

    Note:
        The examples list is initialized as empty if no examples are provided.
        Each example in the provided list will be processed through create_example()
        to ensure proper formatting and validation.
    """
    self.examples: List[ExampleType] = []
    if examples:
        for example in examples:
            self.create_example(example)

from_polars classmethod

from_polars(df: DataFrame) -> ClassifyExampleCollection

Create collection from a Polars DataFrame. Must have an 'output' column and an 'input' column.

Source code in src/fenic/core/types/semantic_examples.py
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
@classmethod
def from_polars(cls, df: pl.DataFrame) -> ClassifyExampleCollection:
    """Create collection from a Polars DataFrame. Must have an 'output' column and an 'input' column."""
    collection = cls()

    if EXAMPLE_INPUT_KEY not in df.columns:
        raise InvalidExampleCollectionError(
            f"Classify Examples DataFrame missing required '{EXAMPLE_INPUT_KEY}' column"
        )
    if EXAMPLE_OUTPUT_KEY not in df.columns:
        raise InvalidExampleCollectionError(
            f"Classify Examples DataFrame missing required '{EXAMPLE_OUTPUT_KEY}' column"
        )

    for row in df.iter_rows(named=True):
        if row[EXAMPLE_INPUT_KEY] is None:
            raise InvalidExampleCollectionError(
                f"Classify Examples DataFrame contains null values in '{EXAMPLE_INPUT_KEY}' column"
            )
        if row[EXAMPLE_OUTPUT_KEY] is None:
            raise InvalidExampleCollectionError(
                f"Classify Examples DataFrame contains null values in '{EXAMPLE_OUTPUT_KEY}' column"
            )

        example = ClassifyExample(
            input=row[EXAMPLE_INPUT_KEY],
            output=row[EXAMPLE_OUTPUT_KEY],
        )
        collection.create_example(example)

    return collection

ColumnField

Represents a typed column in a DataFrame schema.

A ColumnField defines the structure of a single column by specifying its name and data type. This is used as a building block for DataFrame schemas.

Attributes:

  • name (str) –

    The name of the column.

  • data_type (DataType) –

    The data type of the column, as a DataType instance.

DataType

Bases: ABC

Base class for all data types.

You won't instantiate this class directly. Instead, use one of the concrete types like StringType, ArrayType, or StructType.

Used for casting, type validation, and schema inference in the DataFrame API.

DocumentPathType

Bases: _LogicalType

Represents a string containing a a document's local (file system) or remote (URL) path.

EmbeddingType

Bases: _LogicalType

A type representing a fixed-length embedding vector.

Attributes:

  • dimensions (int) –

    The number of dimensions in the embedding vector.

  • embedding_model (str) –

    Name of the model used to generate the embedding.

Create an embedding type for text-embedding-3-small
EmbeddingType(384, embedding_model="text-embedding-3-small")

JoinExample

Bases: BaseModel

A single semantic example for semantic join operations.

Join examples demonstrate the evaluation of two input variables across different datasets against a specific condition, used in a semantic.join operation.

JoinExampleCollection

JoinExampleCollection(examples: List[JoinExample] = None)

Bases: BaseExampleCollection[JoinExample]

Collection of comparison examples for semantic join operations.

Stores examples showing which pairs of values should be considered matches for joining data. Each example contains a left value, right value, and boolean output indicating whether they match.

Initialize a collection of semantic join examples.

Parameters:

  • examples (List[JoinExample], default: None ) –

    List of examples to add to the collection. Each example will be processed through create_example() to ensure proper formatting and validation.

Methods:

  • create_example

    Create an example in the collection with type validation.

  • from_polars

    Create collection from a Polars DataFrame. Must have 'left_on', 'right_on', and 'output' columns.

Source code in src/fenic/core/types/semantic_examples.py
517
518
519
520
521
522
523
524
525
526
def __init__(self, examples: List[JoinExample] = None):
    """Initialize a collection of semantic join examples.

    Args:
        examples: List of examples to add to the collection. Each example
            will be processed through create_example() to ensure proper formatting
            and validation.
    """
    self._type_validator = _ExampleTypeValidator()
    super().__init__(examples)

create_example

create_example(example: JoinExample) -> JoinExampleCollection

Create an example in the collection with type validation.

Validates that left_on and right_on values have consistent types across examples. The first example establishes the types and cannot have None values. Subsequent examples must have matching types but can have None values.

Parameters:

Returns:

Raises:

  • InvalidExampleCollectionError

    If the example type is wrong, if the first example contains None values, or if subsequent examples have type mismatches.

Source code in src/fenic/core/types/semantic_examples.py
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
def create_example(self, example: JoinExample) -> JoinExampleCollection:
    """Create an example in the collection with type validation.

    Validates that left_on and right_on values have consistent types across
    examples. The first example establishes the types and cannot have None values.
    Subsequent examples must have matching types but can have None values.

    Args:
        example: The JoinExample to add.

    Returns:
        Self for method chaining.

    Raises:
        InvalidExampleCollectionError: If the example type is wrong, if the
            first example contains None values, or if subsequent examples
            have type mismatches.
    """
    if not isinstance(example, JoinExample):
        raise InvalidExampleCollectionError(
            f"Expected example of type {JoinExample.__name__}, got {type(example).__name__}"
        )

    # Convert to dict format for validation
    example_dict = {
        LEFT_ON_KEY: example.left_on,
        RIGHT_ON_KEY: example.right_on
    }

    example_num = len(self.examples) + 1
    self._type_validator.process_example(example_dict, example_num)

    self.examples.append(example)
    return self

from_polars classmethod

from_polars(df: DataFrame) -> JoinExampleCollection

Create collection from a Polars DataFrame. Must have 'left_on', 'right_on', and 'output' columns.

Source code in src/fenic/core/types/semantic_examples.py
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
@classmethod
def from_polars(cls, df: pl.DataFrame) -> JoinExampleCollection:
    """Create collection from a Polars DataFrame. Must have 'left_on', 'right_on', and 'output' columns."""
    collection = cls()

    required_columns = [
        LEFT_ON_KEY,
        RIGHT_ON_KEY,
        EXAMPLE_OUTPUT_KEY,
    ]
    for col in required_columns:
        if col not in df.columns:
            raise InvalidExampleCollectionError(
                f"Join Examples DataFrame missing required '{col}' column"
            )

    for row in df.iter_rows(named=True):
        for col in required_columns:
            if row[col] is None:
                raise InvalidExampleCollectionError(
                    f"Join Examples DataFrame contains null values in '{col}' column"
                )

        example = JoinExample(
            left_on=row[LEFT_ON_KEY],
            right_on=row[RIGHT_ON_KEY],
            output=row[EXAMPLE_OUTPUT_KEY],
        )
        collection.create_example(example)

    return collection

KeyPoints

Bases: BaseModel

Summary as a concise bulleted list.

Each bullet should capture a distinct and essential idea, with a maximum number of points specified.

Attributes:

  • max_points (int) –

    The maximum number of key points to include in the summary.

Methods:

  • max_tokens

    Calculate the maximum number of tokens for the summary based on the number of key points.

max_tokens

max_tokens() -> int

Calculate the maximum number of tokens for the summary based on the number of key points.

Source code in src/fenic/core/types/summarize.py
25
26
27
def max_tokens(self) -> int:
    """Calculate the maximum number of tokens for the summary based on the number of key points."""
    return self.max_points * 75

MapExample

Bases: BaseModel

A single semantic example for semantic mapping operations.

Map examples demonstrate the transformation of input variables to a specific output string or structured model used in a semantic.map operation.

MapExampleCollection

MapExampleCollection(examples: List[MapExample] = None)

Bases: BaseExampleCollection[MapExample]

Collection of input-output examples for semantic map operations.

Stores examples that demonstrate how input data should be transformed into output text or structured data. Each example shows the expected output for a given set of input fields.

Initialize a collection of semantic map examples.

Parameters:

  • examples (List[MapExample], default: None ) –

    List of examples to add to the collection. Each example will be processed through create_example() to ensure proper formatting and validation.

Methods:

  • create_example

    Create an example in the collection with output and input type validation.

  • from_polars

    Create collection from a Polars DataFrame. Must have an 'output' column and at least one input column.

Source code in src/fenic/core/types/semantic_examples.py
209
210
211
212
213
214
215
216
217
218
def __init__(self, examples: List[MapExample] = None):
    """Initialize a collection of semantic map examples.

    Args:
        examples: List of examples to add to the collection. Each example
            will be processed through create_example() to ensure proper formatting
            and validation.
    """
    self._type_validator = _ExampleTypeValidator()
    super().__init__(examples)

create_example

create_example(example: MapExample) -> MapExampleCollection

Create an example in the collection with output and input type validation.

Ensures all examples in the collection have consistent output types (either all strings or all BaseModel instances) and validates that input fields have consistent types across examples.

For input validation: - The first example establishes the schema and cannot have None values - Subsequent examples must have the same fields but can have None values - Non-None values must match the established type for each field

Parameters:

Returns:

Raises:

  • InvalidExampleCollectionError

    If the example output type doesn't match the existing examples in the collection, if the first example contains None values, or if subsequent examples have type mismatches.

Source code in src/fenic/core/types/semantic_examples.py
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
def create_example(self, example: MapExample) -> MapExampleCollection:
    """Create an example in the collection with output and input type validation.

    Ensures all examples in the collection have consistent output types
    (either all strings or all BaseModel instances) and validates that input
    fields have consistent types across examples.

    For input validation:
    - The first example establishes the schema and cannot have None values
    - Subsequent examples must have the same fields but can have None values
    - Non-None values must match the established type for each field

    Args:
        example: The MapExample to add.

    Returns:
        Self for method chaining.

    Raises:
        InvalidExampleCollectionError: If the example output type doesn't match
            the existing examples in the collection, if the first example contains
            None values, or if subsequent examples have type mismatches.
    """
    if not isinstance(example, MapExample):
        raise InvalidExampleCollectionError(
            f"Expected example of type {MapExample.__name__}, got {type(example).__name__}"
        )

    # Validate output type consistency
    self._validate_single_example_output_type(example)

    # Validate input types
    example_num = len(self.examples) + 1
    self._type_validator.process_example(example.input, example_num)

    self.examples.append(example)
    return self

from_polars classmethod

from_polars(df: DataFrame) -> MapExampleCollection

Create collection from a Polars DataFrame. Must have an 'output' column and at least one input column.

Source code in src/fenic/core/types/semantic_examples.py
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
@classmethod
def from_polars(cls, df: pl.DataFrame) -> MapExampleCollection:
    """Create collection from a Polars DataFrame. Must have an 'output' column and at least one input column."""
    collection = cls()

    if EXAMPLE_OUTPUT_KEY not in df.columns:
        raise ValueError(
            f"Map Examples DataFrame missing required '{EXAMPLE_OUTPUT_KEY}' column"
        )

    input_cols = [col for col in df.columns if col != EXAMPLE_OUTPUT_KEY]

    if not input_cols:
        raise ValueError(
            "Map Examples DataFrame must have at least one input column"
        )

    for row in df.iter_rows(named=True):
        input_dict = {col: row[col] for col in input_cols}
        example = MapExample(input=input_dict, output=row[EXAMPLE_OUTPUT_KEY])
        collection.create_example(example)

    return collection

Paragraph

Bases: BaseModel

Summary as a cohesive narrative.

The summary should flow naturally and not exceed a specified maximum word count.

Attributes:

  • max_words (int) –

    The maximum number of words allowed in the summary.

Methods:

  • max_tokens

    Calculate the maximum number of tokens for the summary based on the number of words.

max_tokens

max_tokens() -> int

Calculate the maximum number of tokens for the summary based on the number of words.

Source code in src/fenic/core/types/summarize.py
46
47
48
def max_tokens(self) -> int:
    """Calculate the maximum number of tokens for the summary based on the number of words."""
    return int(self.max_words * 1.5)

PredicateExample

Bases: BaseModel

A single semantic example for semantic predicate operations.

Predicate examples demonstrate the evaluation of input variables against a specific condition, used in a semantic.predicate operation.

PredicateExampleCollection

PredicateExampleCollection(examples: List[PredicateExample] = None)

Bases: BaseExampleCollection[PredicateExample]

Collection of input-to-boolean examples for predicate operations.

Stores examples showing which inputs should evaluate to True or False based on some condition. Each example contains input fields and a boolean output indicating whether the condition holds.

Initialize a collection of semantic predicate examples.

Parameters:

  • examples (List[PredicateExample], default: None ) –

    List of examples to add to the collection. Each example will be processed through create_example() to ensure proper formatting and validation.

Methods:

  • create_example

    Create an example in the collection with input type validation.

  • from_polars

    Create collection from a Polars DataFrame.

Source code in src/fenic/core/types/semantic_examples.py
414
415
416
417
418
419
420
421
422
423
def __init__(self, examples: List[PredicateExample] = None):
    """Initialize a collection of semantic predicate examples.

    Args:
        examples: List of examples to add to the collection. Each example
            will be processed through create_example() to ensure proper formatting
            and validation.
    """
    self._type_validator = _ExampleTypeValidator()
    super().__init__(examples)

create_example

create_example(example: PredicateExample) -> PredicateExampleCollection

Create an example in the collection with input type validation.

Validates that input fields have consistent types across examples. The first example establishes the schema and cannot have None values. Subsequent examples must have the same fields but can have None values.

Parameters:

Returns:

Raises:

  • InvalidExampleCollectionError

    If the example type is wrong, if the first example contains None values, or if subsequent examples have type mismatches.

Source code in src/fenic/core/types/semantic_examples.py
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
def create_example(self, example: PredicateExample) -> PredicateExampleCollection:
    """Create an example in the collection with input type validation.

    Validates that input fields have consistent types across examples.
    The first example establishes the schema and cannot have None values.
    Subsequent examples must have the same fields but can have None values.

    Args:
        example: The PredicateExample to add.

    Returns:
        Self for method chaining.

    Raises:
        InvalidExampleCollectionError: If the example type is wrong, if the
            first example contains None values, or if subsequent examples
            have type mismatches.
    """
    if not isinstance(example, PredicateExample):
        raise InvalidExampleCollectionError(
            f"Expected example of type {PredicateExample.__name__}, got {type(example).__name__}"
        )

    # Validate input types
    example_num = len(self.examples) + 1
    self._type_validator.process_example(example.input, example_num)

    self.examples.append(example)
    return self

from_polars classmethod

from_polars(df: DataFrame) -> PredicateExampleCollection

Create collection from a Polars DataFrame.

Source code in src/fenic/core/types/semantic_examples.py
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
@classmethod
def from_polars(cls, df: pl.DataFrame) -> PredicateExampleCollection:
    """Create collection from a Polars DataFrame."""
    collection = cls()

    # Validate output column exists
    if EXAMPLE_OUTPUT_KEY not in df.columns:
        raise InvalidExampleCollectionError(
            f"Predicate Examples DataFrame missing required '{EXAMPLE_OUTPUT_KEY}' column"
        )

    input_cols = [col for col in df.columns if col != EXAMPLE_OUTPUT_KEY]

    if not input_cols:
        raise InvalidExampleCollectionError(
            "Predicate Examples DataFrame must have at least one input column"
        )

    for row in df.iter_rows(named=True):
        if row[EXAMPLE_OUTPUT_KEY] is None:
            raise InvalidExampleCollectionError(
                f"Predicate Examples DataFrame contains null values in '{EXAMPLE_OUTPUT_KEY}' column"
            )

        input_dict = {col: row[col] for col in input_cols if row[col] is not None}

        example = PredicateExample(input=input_dict, output=row[EXAMPLE_OUTPUT_KEY])
        collection.create_example(example)

    return collection

QueryResult dataclass

QueryResult(data: DataLike, metrics: QueryMetrics)

Container for query execution results and associated metadata.

This dataclass bundles together the materialized data from a query execution along with metrics about the execution process. It provides a unified interface for accessing both the computed results and performance information.

Attributes:

  • data (DataLike) –

    The materialized query results in the requested format. Can be any of the supported data types (Polars/Pandas DataFrame, Arrow Table, or Python dict/list structures).

  • metrics (QueryMetrics) –

    Execution metadata including timing information, memory usage, rows processed, and other performance metrics collected during query execution.

Access query results and metrics
# Execute query and get results with metrics
result = df.filter(col("age") > 25).collect("pandas")
pandas_df = result.data  # Access the Pandas DataFrame
print(result.metrics.execution_time)  # Access execution metrics
print(result.metrics.rows_processed)  # Access row count
Work with different data formats
# Get results in different formats
polars_result = df.collect("polars")
arrow_result = df.collect("arrow")
dict_result = df.collect("pydict")

# All contain the same data, different formats
print(type(polars_result.data))  # <class 'polars.DataFrame'>
print(type(arrow_result.data))   # <class 'pyarrow.lib.Table'>
print(type(dict_result.data))    # <class 'dict'>
Note

The actual type of the data attribute depends on the format requested during collection. Use type checking or isinstance() if you need to handle the data differently based on its format.

Schema

Represents the schema of a DataFrame.

A Schema defines the structure of a DataFrame by specifying an ordered collection of column fields. Each column field defines the name and data type of a column in the DataFrame.

Attributes:

  • column_fields (List[ColumnField]) –

    An ordered list of ColumnField objects that define the structure of the DataFrame.

Methods:

  • column_names

    Get a list of all column names in the schema.

column_names

column_names() -> List[str]

Get a list of all column names in the schema.

Returns:

  • List[str]

    A list of strings containing the names of all columns in the schema.

Source code in src/fenic/core/types/schema.py
116
117
118
119
120
121
122
def column_names(self) -> List[str]:
    """Get a list of all column names in the schema.

    Returns:
        A list of strings containing the names of all columns in the schema.
    """
    return [field.name for field in self.column_fields]

StructField

A field in a StructType. Fields are nullable.

Attributes:

  • name (str) –

    The name of the field.

  • data_type (DataType) –

    The data type of the field.

StructType

Bases: DataType

A type representing a struct (record) with named fields.

Attributes:

  • fields

    List of field definitions.

Create a struct with name and age fields
StructType([
    StructField("name", StringType),
    StructField("age", IntegerType),
])

TranscriptType

Bases: _LogicalType

Represents a string containing a transcript in a specific format.