fenic.core.types
Schema module for defining and manipulating DataFrame schemas.
Classes:
-
ArrayType
–A type representing a homogeneous variable-length array (list) of elements.
-
ClassDefinition
–Definition of a classification class with optional description.
-
ClassifyExample
–A single semantic example for classification operations.
-
ClassifyExampleCollection
–Collection of text-to-category examples for classification operations.
-
ColumnField
–Represents a typed column in a DataFrame schema.
-
DataType
–Base class for all data types.
-
DatasetMetadata
–Metadata for a dataset (table or view).
-
DocumentPathType
–Represents a string containing a a document's local (file system) or remote (URL) path.
-
EmbeddingType
–A type representing a fixed-length embedding vector.
-
JoinExample
–A single semantic example for semantic join operations.
-
JoinExampleCollection
–Collection of comparison examples for semantic join operations.
-
KeyPoints
–Summary as a concise bulleted list.
-
MapExample
–A single semantic example for semantic mapping operations.
-
MapExampleCollection
–Collection of input-output examples for semantic map operations.
-
Paragraph
–Summary as a cohesive narrative.
-
PredicateExample
–A single semantic example for semantic predicate operations.
-
PredicateExampleCollection
–Collection of input-to-boolean examples for predicate operations.
-
QueryResult
–Container for query execution results and associated metadata.
-
Schema
–Represents the schema of a DataFrame.
-
StructField
–A field in a StructType. Fields are nullable.
-
StructType
–A type representing a struct (record) with named fields.
-
TranscriptType
–Represents a string containing a transcript in a specific format.
Attributes:
-
BooleanType
–Represents a boolean value. (True/False)
-
BranchSide
–Type alias representing the side of a branch in a lineage graph.
-
DataLike
–Union type representing any supported data format for both input and output operations.
-
DataLikeType
–String literal type for specifying data output formats.
-
DoubleType
–Represents a 64-bit floating-point number.
-
FloatType
–Represents a 32-bit floating-point number.
-
FuzzySimilarityMethod
–Type alias representing the supported fuzzy string similarity algorithms.
-
HtmlType
–Represents a string containing raw HTML markup.
-
IntegerType
–Represents a signed integer value.
-
JsonType
–Represents a string containing JSON data.
-
MarkdownType
–Represents a string containing Markdown-formatted text.
-
SemanticSimilarityMetric
–Type alias representing supported semantic similarity metrics.
-
StringType
–Represents a UTF-8 encoded string value.
BooleanType
module-attribute
BooleanType = _BooleanType()
Represents a boolean value. (True/False)
BranchSide
module-attribute
BranchSide = Literal['left', 'right']
Type alias representing the side of a branch in a lineage graph.
Valid values:
- "left": The left branch of a join.
- "right": The right branch of a join.
DataLike
module-attribute
DataLike = Union[DataFrame, DataFrame, Dict[str, List[Any]], List[Dict[str, Any]], Table]
Union type representing any supported data format for both input and output operations.
This type encompasses all possible data structures that can be: 1. Used as input when creating DataFrames 2. Returned as output from query results
Supported formats
- pl.DataFrame: Native Polars DataFrame with efficient columnar storage
- pd.DataFrame: Pandas DataFrame, optionally with PyArrow extension arrays
- Dict[str, List[Any]]: Column-oriented dictionary where:
- Keys are column names (str)
- Values are lists containing all values for that column
- List[Dict[str, Any]]: Row-oriented list where:
- Each element is a dictionary representing one row
- Dictionary keys are column names, values are cell values
- pa.Table: Apache Arrow Table with columnar memory layout
Usage
- Input: Used in create_dataframe() to accept data in various formats
- Output: Used in QueryResult.data to return results in requested format
The specific type returned depends on the DataLikeType format specified when collecting query results.
DataLikeType
module-attribute
DataLikeType = Literal['polars', 'pandas', 'pydict', 'pylist', 'arrow']
String literal type for specifying data output formats.
Valid values
- "polars": Native Polars DataFrame format
- "pandas": Pandas DataFrame with PyArrow extension arrays
- "pydict": Python dictionary with column names as keys, lists as values
- "pylist": Python list of dictionaries, each representing one row
- "arrow": Apache Arrow Table format
Used as input parameter for methods that can return data in multiple formats.
DoubleType
module-attribute
DoubleType = _DoubleType()
Represents a 64-bit floating-point number.
FloatType
module-attribute
FloatType = _FloatType()
Represents a 32-bit floating-point number.
FuzzySimilarityMethod
module-attribute
FuzzySimilarityMethod = Literal['indel', 'levenshtein', 'damerau_levenshtein', 'jaro_winkler', 'jaro', 'hamming']
Type alias representing the supported fuzzy string similarity algorithms.
These algorithms quantify the similarity or difference between two strings using various distance or similarity metrics:
- "indel": Computes the Indel (Insertion-Deletion) distance, which counts only insertions and deletions needed to transform one string into another, excluding substitutions. This is equivalent to the Longest Common Subsequence (LCS) problem. Useful when character substitutions should not be considered as valid operations (e.g., DNA sequence alignment where only insertions/deletions occur).
- "levenshtein": Computes the Levenshtein distance, which is the minimum number of single-character edits (insertions, deletions, or substitutions) required to transform one string into another. Suitable for general-purpose fuzzy matching where transpositions do not matter.
- "damerau_levenshtein": An extension of Levenshtein distance that also accounts for transpositions of adjacent characters (e.g., "ab" → "ba"). This metric is more accurate for real-world typos and keyboard errors.
- "jaro": Measures similarity based on the number and order of common characters between two strings. It is particularly effective for short strings such as names. Returns a normalized score between 0 (no similarity) and 1 (exact match).
- "jaro_winkler": A variant of the Jaro distance that gives more weight to common prefixes. Designed to improve accuracy on strings with shared beginnings (e.g., first names, surnames).
- "hamming": Measures the number of differing characters between two strings of equal length. Only valid when both strings are the same length. It does not support insertions or deletions—only substitutions.
Choose the method based on the type of expected variation (e.g., typos, transpositions, or structural changes).
HtmlType
module-attribute
HtmlType = _HtmlType()
Represents a string containing raw HTML markup.
IntegerType
module-attribute
IntegerType = _IntegerType()
Represents a signed integer value.
JsonType
module-attribute
JsonType = _JsonType()
Represents a string containing JSON data.
MarkdownType
module-attribute
MarkdownType = _MarkdownType()
Represents a string containing Markdown-formatted text.
SemanticSimilarityMetric
module-attribute
SemanticSimilarityMetric = Literal['cosine', 'l2', 'dot']
Type alias representing supported semantic similarity metrics.
Valid values:
- "cosine": Cosine similarity, measures the cosine of the angle between two vectors.
- "l2": Euclidean (L2) distance, measures the straight-line distance between two vectors.
- "dot": Dot product similarity, the raw inner product of two vectors.
These metrics are commonly used for comparing embedding vectors in semantic search and other similarity-based applications.
StringType
module-attribute
StringType = _StringType()
Represents a UTF-8 encoded string value.
ArrayType
ClassDefinition
Bases: BaseModel
Definition of a classification class with optional description.
Used to define the available classes for semantic classification operations. The description helps the LLM understand what each class represents.
ClassifyExample
Bases: BaseModel
A single semantic example for classification operations.
Classify examples demonstrate the classification of an input string into a specific category string, used in a semantic.classify operation.
ClassifyExampleCollection
ClassifyExampleCollection(examples: List[ExampleType] = None)
Bases: BaseExampleCollection[ClassifyExample]
Collection of text-to-category examples for classification operations.
Stores examples showing which category each input text should be assigned to. Each example contains an input string and its corresponding category label.
Methods:
-
from_polars
–Create collection from a Polars DataFrame. Must have an 'output' column and an 'input' column.
Source code in src/fenic/core/types/semantic_examples.py
130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 |
|
from_polars
classmethod
from_polars(df: DataFrame) -> ClassifyExampleCollection
Create collection from a Polars DataFrame. Must have an 'output' column and an 'input' column.
Source code in src/fenic/core/types/semantic_examples.py
387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 |
|
ColumnField
Represents a typed column in a DataFrame schema.
A ColumnField defines the structure of a single column by specifying its name and data type. This is used as a building block for DataFrame schemas.
Attributes:
-
name
(str
) –The name of the column.
-
data_type
(DataType
) –The data type of the column, as a DataType instance.
DataType
Bases: ABC
Base class for all data types.
You won't instantiate this class directly. Instead, use one of the
concrete types like StringType
, ArrayType
, or StructType
.
Used for casting, type validation, and schema inference in the DataFrame API.
DatasetMetadata
Metadata for a dataset (table or view).
Attributes:
-
schema
(Schema
) –The schema of the dataset.
-
description
(Optional[str]
) –The natural language description of the dataset's contents.
DocumentPathType
Bases: _LogicalType
Represents a string containing a a document's local (file system) or remote (URL) path.
EmbeddingType
Bases: _LogicalType
A type representing a fixed-length embedding vector.
Attributes:
-
dimensions
(int
) –The number of dimensions in the embedding vector.
-
embedding_model
(str
) –Name of the model used to generate the embedding.
Create an embedding type for text-embedding-3-small
EmbeddingType(384, embedding_model="text-embedding-3-small")
JoinExample
Bases: BaseModel
A single semantic example for semantic join operations.
Join examples demonstrate the evaluation of two input variables across different datasets against a specific condition, used in a semantic.join operation.
JoinExampleCollection
JoinExampleCollection(examples: List[JoinExample] = None)
Bases: BaseExampleCollection[JoinExample]
Collection of comparison examples for semantic join operations.
Stores examples showing which pairs of values should be considered matches for joining data. Each example contains a left value, right value, and boolean output indicating whether they match.
Initialize a collection of semantic join examples.
Parameters:
-
examples
(List[JoinExample]
, default:None
) –List of examples to add to the collection. Each example will be processed through create_example() to ensure proper formatting and validation.
Methods:
-
create_example
–Create an example in the collection with type validation.
-
from_polars
–Create collection from a Polars DataFrame. Must have 'left_on', 'right_on', and 'output' columns.
Source code in src/fenic/core/types/semantic_examples.py
566 567 568 569 570 571 572 573 574 575 |
|
create_example
create_example(example: JoinExample) -> JoinExampleCollection
Create an example in the collection with type validation.
Validates that left_on and right_on values have consistent types across examples. The first example establishes the types and cannot have None values. Subsequent examples must have matching types but can have None values.
Parameters:
-
example
(JoinExample
) –The JoinExample to add.
Returns:
-
JoinExampleCollection
–Self for method chaining.
Raises:
-
InvalidExampleCollectionError
–If the example type is wrong, if the first example contains None values, or if subsequent examples have type mismatches.
Source code in src/fenic/core/types/semantic_examples.py
577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 |
|
from_polars
classmethod
from_polars(df: DataFrame) -> JoinExampleCollection
Create collection from a Polars DataFrame. Must have 'left_on', 'right_on', and 'output' columns.
Source code in src/fenic/core/types/semantic_examples.py
612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 |
|
KeyPoints
Bases: BaseModel
Summary as a concise bulleted list.
Each bullet should capture a distinct and essential idea, with a maximum number of points specified.
Attributes:
-
max_points
(int
) –The maximum number of key points to include in the summary.
Methods:
-
max_tokens
–Calculate the maximum number of tokens for the summary based on the number of key points.
max_tokens
max_tokens() -> int
Calculate the maximum number of tokens for the summary based on the number of key points.
Source code in src/fenic/core/types/summarize.py
25 26 27 |
|
MapExample
Bases: BaseModel
A single semantic example for semantic mapping operations.
Map examples demonstrate the transformation of input variables to a specific output string or structured model used in a semantic.map operation.
MapExampleCollection
MapExampleCollection(examples: List[MapExample] = None)
Bases: BaseExampleCollection[MapExample]
Collection of input-output examples for semantic map operations.
Stores examples that demonstrate how input data should be transformed into output text or structured data. Each example shows the expected output for a given set of input fields.
Initialize a collection of semantic map examples.
Parameters:
-
examples
(List[MapExample]
, default:None
) –List of examples to add to the collection. Each example will be processed through create_example() to ensure proper formatting and validation.
Methods:
-
create_example
–Create an example in the collection with output and input type validation.
-
from_polars
–Create collection from a Polars DataFrame. Must have an 'output' column and at least one input column.
Source code in src/fenic/core/types/semantic_examples.py
258 259 260 261 262 263 264 265 266 267 |
|
create_example
create_example(example: MapExample) -> MapExampleCollection
Create an example in the collection with output and input type validation.
Ensures all examples in the collection have consistent output types (either all strings or all BaseModel instances) and validates that input fields have consistent types across examples.
For input validation: - The first example establishes the schema and cannot have None values - Subsequent examples must have the same fields but can have None values - Non-None values must match the established type for each field
Parameters:
-
example
(MapExample
) –The MapExample to add.
Returns:
-
MapExampleCollection
–Self for method chaining.
Raises:
-
InvalidExampleCollectionError
–If the example output type doesn't match the existing examples in the collection, if the first example contains None values, or if subsequent examples have type mismatches.
Source code in src/fenic/core/types/semantic_examples.py
269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 |
|
from_polars
classmethod
from_polars(df: DataFrame) -> MapExampleCollection
Create collection from a Polars DataFrame. Must have an 'output' column and at least one input column.
Source code in src/fenic/core/types/semantic_examples.py
307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 |
|
Paragraph
Bases: BaseModel
Summary as a cohesive narrative.
The summary should flow naturally and not exceed a specified maximum word count.
Attributes:
-
max_words
(int
) –The maximum number of words allowed in the summary.
Methods:
-
max_tokens
–Calculate the maximum number of tokens for the summary based on the number of words.
max_tokens
max_tokens() -> int
Calculate the maximum number of tokens for the summary based on the number of words.
Source code in src/fenic/core/types/summarize.py
46 47 48 |
|
PredicateExample
Bases: BaseModel
A single semantic example for semantic predicate operations.
Predicate examples demonstrate the evaluation of input variables against a specific condition, used in a semantic.predicate operation.
PredicateExampleCollection
PredicateExampleCollection(examples: List[PredicateExample] = None)
Bases: BaseExampleCollection[PredicateExample]
Collection of input-to-boolean examples for predicate operations.
Stores examples showing which inputs should evaluate to True or False based on some condition. Each example contains input fields and a boolean output indicating whether the condition holds.
Initialize a collection of semantic predicate examples.
Parameters:
-
examples
(List[PredicateExample]
, default:None
) –List of examples to add to the collection. Each example will be processed through create_example() to ensure proper formatting and validation.
Methods:
-
create_example
–Create an example in the collection with input type validation.
-
from_polars
–Create collection from a Polars DataFrame.
Source code in src/fenic/core/types/semantic_examples.py
463 464 465 466 467 468 469 470 471 472 |
|
create_example
create_example(example: PredicateExample) -> PredicateExampleCollection
Create an example in the collection with input type validation.
Validates that input fields have consistent types across examples. The first example establishes the schema and cannot have None values. Subsequent examples must have the same fields but can have None values.
Parameters:
-
example
(PredicateExample
) –The PredicateExample to add.
Returns:
-
PredicateExampleCollection
–Self for method chaining.
Raises:
-
InvalidExampleCollectionError
–If the example type is wrong, if the first example contains None values, or if subsequent examples have type mismatches.
Source code in src/fenic/core/types/semantic_examples.py
474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 |
|
from_polars
classmethod
from_polars(df: DataFrame) -> PredicateExampleCollection
Create collection from a Polars DataFrame.
Source code in src/fenic/core/types/semantic_examples.py
504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 |
|
QueryResult
dataclass
QueryResult(data: DataLike, metrics: QueryMetrics)
Container for query execution results and associated metadata.
This dataclass bundles together the materialized data from a query execution along with metrics about the execution process. It provides a unified interface for accessing both the computed results and performance information.
Attributes:
-
data
(DataLike
) –The materialized query results in the requested format. Can be any of the supported data types (Polars/Pandas DataFrame, Arrow Table, or Python dict/list structures).
-
metrics
(QueryMetrics
) –Execution metadata including timing information, memory usage, rows processed, and other performance metrics collected during query execution.
Access query results and metrics
# Execute query and get results with metrics
result = df.filter(col("age") > 25).collect("pandas")
pandas_df = result.data # Access the Pandas DataFrame
print(result.metrics.execution_time) # Access execution metrics
print(result.metrics.rows_processed) # Access row count
Work with different data formats
# Get results in different formats
polars_result = df.collect("polars")
arrow_result = df.collect("arrow")
dict_result = df.collect("pydict")
# All contain the same data, different formats
print(type(polars_result.data)) # <class 'polars.DataFrame'>
print(type(arrow_result.data)) # <class 'pyarrow.lib.Table'>
print(type(dict_result.data)) # <class 'dict'>
Note
The actual type of the data
attribute depends on the format requested
during collection. Use type checking or isinstance() if you need to
handle the data differently based on its format.
Schema
Represents the schema of a DataFrame.
A Schema defines the structure of a DataFrame by specifying an ordered collection of column fields. Each column field defines the name and data type of a column in the DataFrame.
Attributes:
-
column_fields
(List[ColumnField]
) –An ordered list of ColumnField objects that define the structure of the DataFrame.
Methods:
-
column_names
–Get a list of all column names in the schema.
column_names
column_names() -> List[str]
Get a list of all column names in the schema.
Returns:
-
List[str]
–A list of strings containing the names of all columns in the schema.
Source code in src/fenic/core/types/schema.py
117 118 119 120 121 122 123 |
|
StructField
A field in a StructType. Fields are nullable.
Attributes:
-
name
(str
) –The name of the field.
-
data_type
(DataType
) –The data type of the field.
StructType
Bases: DataType
A type representing a struct (record) with named fields.
Attributes:
-
fields
–List of field definitions.
Create a struct with name and age fields
StructType([
StructField("name", StringType),
StructField("age", IntegerType),
])
TranscriptType
Bases: _LogicalType
Represents a string containing a transcript in a specific format.