fenic.core
Core module for Fenic.
Classes:
-
ArrayType
–A type representing a homogeneous variable-length array (list) of elements.
-
ClassifyExample
–A single semantic example for classification operations.
-
ClassifyExampleCollection
–Collection of examples for semantic classification operations.
-
ColumnField
–Represents a typed column in a DataFrame schema.
-
DataType
–Base class for all data types.
-
DocumentPathType
–Represents a string containing a a document's local (file system) or remote (URL) path.
-
EmbeddingType
–A type representing a fixed-length embedding vector.
-
ExtractSchema
–Represents a structured extraction schema.
-
ExtractSchemaField
–Represents a field within an structured extraction schema.
-
ExtractSchemaList
–Represents a list data type for structured extraction schema definitions.
-
JoinExample
–A single semantic example for semantic join operations.
-
JoinExampleCollection
–Collection of examples for semantic join operations.
-
LMMetrics
–Tracks language model usage metrics including token counts and costs.
-
MapExample
–A single semantic example for semantic mapping operations.
-
MapExampleCollection
–Collection of examples for semantic mapping operations.
-
OperatorMetrics
–Metrics for a single operator in the query execution plan.
-
PredicateExample
–A single semantic example for semantic predicate operations.
-
PredicateExampleCollection
–Collection of examples for semantic predicate operations.
-
QueryMetrics
–Comprehensive metrics for an executed query.
-
QueryResult
–Container for query execution results and associated metadata.
-
RMMetrics
–Tracks embedding model usage metrics including token counts and costs.
-
Schema
–Represents the schema of a DataFrame.
-
StructField
–A field in a StructType. Fields are nullable.
-
StructType
–A type representing a struct (record) with named fields.
-
TranscriptType
–Represents a string containing a transcript in a specific format.
Attributes:
-
BooleanType
–Represents a boolean value. (True/False)
-
BranchSide
–Type alias representing the side of a branch in a lineage graph.
-
DataLike
–Union type representing any supported data format for both input and output operations.
-
DataLikeType
–String literal type for specifying data output formats.
-
DoubleType
–Represents a 64-bit floating-point number.
-
FloatType
–Represents a 32-bit floating-point number.
-
HtmlType
–Represents a string containing raw HTML markup.
-
IntegerType
–Represents a signed integer value.
-
JsonType
–Represents a string containing JSON data.
-
MarkdownType
–Represents a string containing Markdown-formatted text.
-
SemanticSimilarityMetric
–Type alias representing supported semantic similarity metrics.
-
StringType
–Represents a UTF-8 encoded string value.
BooleanType
module-attribute
BooleanType = _BooleanType()
Represents a boolean value. (True/False)
BranchSide
module-attribute
BranchSide = Literal['left', 'right']
Type alias representing the side of a branch in a lineage graph.
Valid values:
- "left": The left branch of a join.
- "right": The right branch of a join.
DataLike
module-attribute
DataLike = Union[DataFrame, DataFrame, Dict[str, List[Any]], List[Dict[str, Any]], Table]
Union type representing any supported data format for both input and output operations.
This type encompasses all possible data structures that can be: 1. Used as input when creating DataFrames 2. Returned as output from query results
Supported formats
- pl.DataFrame: Native Polars DataFrame with efficient columnar storage
- pd.DataFrame: Pandas DataFrame, optionally with PyArrow extension arrays
- Dict[str, List[Any]]: Column-oriented dictionary where:
- Keys are column names (str)
- Values are lists containing all values for that column
- List[Dict[str, Any]]: Row-oriented list where:
- Each element is a dictionary representing one row
- Dictionary keys are column names, values are cell values
- pa.Table: Apache Arrow Table with columnar memory layout
Usage
- Input: Used in create_dataframe() to accept data in various formats
- Output: Used in QueryResult.data to return results in requested format
The specific type returned depends on the DataLikeType format specified when collecting query results.
DataLikeType
module-attribute
DataLikeType = Literal['polars', 'pandas', 'pydict', 'pylist', 'arrow']
String literal type for specifying data output formats.
Valid values
- "polars": Native Polars DataFrame format
- "pandas": Pandas DataFrame with PyArrow extension arrays
- "pydict": Python dictionary with column names as keys, lists as values
- "pylist": Python list of dictionaries, each representing one row
- "arrow": Apache Arrow Table format
Used as input parameter for methods that can return data in multiple formats.
DoubleType
module-attribute
DoubleType = _DoubleType()
Represents a 64-bit floating-point number.
FloatType
module-attribute
FloatType = _FloatType()
Represents a 32-bit floating-point number.
HtmlType
module-attribute
HtmlType = _HtmlType()
Represents a string containing raw HTML markup.
IntegerType
module-attribute
IntegerType = _IntegerType()
Represents a signed integer value.
JsonType
module-attribute
JsonType = _JsonType()
Represents a string containing JSON data.
MarkdownType
module-attribute
MarkdownType = _MarkdownType()
Represents a string containing Markdown-formatted text.
SemanticSimilarityMetric
module-attribute
SemanticSimilarityMetric = Literal['cosine', 'l2', 'dot']
Type alias representing supported semantic similarity metrics.
Valid values:
- "cosine": Cosine similarity, measures the cosine of the angle between two vectors.
- "l2": Euclidean (L2) distance, measures the straight-line distance between two vectors.
- "dot": Dot product similarity, the raw inner product of two vectors.
These metrics are commonly used for comparing embedding vectors in semantic search and other similarity-based applications.
StringType
module-attribute
StringType = _StringType()
Represents a UTF-8 encoded string value.
ArrayType
ClassifyExample
Bases: BaseModel
A single semantic example for classification operations.
Classify examples demonstrate the classification of an input string into a specific category string, used in a semantic.classify operation.
ClassifyExampleCollection
ClassifyExampleCollection(examples: List[ExampleType] = None)
Bases: BaseExampleCollection[ClassifyExample]
Collection of examples for semantic classification operations.
Classification operations categorize input text into predefined classes. This collection manages examples that demonstrate the expected classification results for different inputs.
Examples in this collection have a single input string and an output string representing the classification result.
Methods:
-
from_polars
–Create collection from a Polars DataFrame. Must have an 'output' column and an 'input' column.
Source code in src/fenic/core/types/semantic_examples.py
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 |
|
from_polars
classmethod
from_polars(df: DataFrame) -> ClassifyExampleCollection
Create collection from a Polars DataFrame. Must have an 'output' column and an 'input' column.
Source code in src/fenic/core/types/semantic_examples.py
295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 |
|
ColumnField
Represents a typed column in a DataFrame schema.
A ColumnField defines the structure of a single column by specifying its name and data type. This is used as a building block for DataFrame schemas.
Attributes:
-
name
(str
) –The name of the column.
-
data_type
(DataType
) –The data type of the column, as a DataType instance.
DataType
Bases: ABC
Base class for all data types.
You won't instantiate this class directly. Instead, use one of the
concrete types like StringType
, ArrayType
, or StructType
.
Used for casting, type validation, and schema inference in the DataFrame API.
DocumentPathType
Bases: _StringBackedType
Represents a string containing a a document's local (file system) or remote (URL) path.
EmbeddingType
Bases: DataType
A type representing a fixed-length embedding vector.
Attributes:
-
dimensions
(int
) –The number of dimensions in the embedding vector.
-
embedding_model
(str
) –Name of the model used to generate the embedding.
Create an embedding type for text-embedding-3-small
EmbeddingType(384, embedding_model="text-embedding-3-small")
ExtractSchema
Represents a structured extraction schema.
An extract schema contains a collection of named fields with descriptions that define what information should be extracted into each field.
Methods:
-
field_names
–Get a list of all field names in the schema.
field_names
field_names() -> List[str]
Get a list of all field names in the schema.
Returns:
-
List[str]
–A list of strings containing the names of all fields in the schema.
Source code in src/fenic/core/types/extract_schema.py
123 124 125 126 127 128 129 |
|
ExtractSchemaField
ExtractSchemaField(name: str, data_type: Union[DataType, ExtractSchemaList, ExtractSchema], description: str)
Represents a field within an structured extraction schema.
An extract schema field has a name, a data type, and a required description that explains what information should be extracted into this field.
Initialize an ExtractSchemaField.
Parameters:
-
name
(str
) –The name of the field.
-
data_type
(Union[DataType, ExtractSchemaList, ExtractSchema]
) –The data type of the field. Must be either a primitive DataType, ExtractSchemaList, or ExtractSchema.
-
description
(str
) –A description of what information should be extracted into this field.
Raises:
-
ValueError
–If data_type is a non-primitive DataType.
Source code in src/fenic/core/types/extract_schema.py
64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 |
|
ExtractSchemaList
ExtractSchemaList(element_type: Union[DataType, ExtractSchema])
Represents a list data type for structured extraction schema definitions.
A schema list contains elements of a specific data type and is used for defining array-like structures in structured extraction schemas.
Initialize an ExtractSchemaList.
Parameters:
-
element_type
(Union[DataType, ExtractSchema]
) –The data type of elements in the list. Must be either a primitive DataType or another ExtractSchema.
Raises:
-
ValueError
–If element_type is a non-primitive DataType.
Source code in src/fenic/core/types/extract_schema.py
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
|
JoinExample
Bases: BaseModel
A single semantic example for semantic join operations.
Join examples demonstrate the evaluation of two input strings across different datasets against a specific condition, used in a semantic.join operation.
JoinExampleCollection
JoinExampleCollection(examples: List[ExampleType] = None)
Bases: BaseExampleCollection[JoinExample]
Collection of examples for semantic join operations.
Methods:
-
from_polars
–Create collection from a Polars DataFrame. Must have 'left', 'right', and 'output' columns.
Source code in src/fenic/core/types/semantic_examples.py
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 |
|
from_polars
classmethod
from_polars(df: DataFrame) -> JoinExampleCollection
Create collection from a Polars DataFrame. Must have 'left', 'right', and 'output' columns.
Source code in src/fenic/core/types/semantic_examples.py
451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 |
|
LMMetrics
dataclass
LMMetrics(num_uncached_input_tokens: int = 0, num_cached_input_tokens: int = 0, num_output_tokens: int = 0, cost: float = 0.0, num_requests: int = 0)
Tracks language model usage metrics including token counts and costs.
Attributes:
-
num_uncached_input_tokens
(int
) –Number of uncached tokens in the prompt/input
-
num_cached_input_tokens
(int
) –Number of cached tokens in the prompt/input,
-
num_output_tokens
(int
) –Number of tokens in the completion/output
-
cost
(float
) –Total cost in USD for the LM API call
MapExample
Bases: BaseModel
A single semantic example for semantic mapping operations.
Map examples demonstrate the transformation of input variables to a specific output string used in a semantic.map operation.
MapExampleCollection
MapExampleCollection(examples: List[ExampleType] = None)
Bases: BaseExampleCollection[MapExample]
Collection of examples for semantic mapping operations.
Map operations transform input variables into a text output according to specified instructions. This collection manages examples that demonstrate the expected transformations for different inputs.
Examples in this collection can have multiple input variables, each mapped to their respective values, with a single output string representing the expected transformation result.
Methods:
-
from_polars
–Create collection from a Polars DataFrame. Must have an 'output' column and at least one input column.
Source code in src/fenic/core/types/semantic_examples.py
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 |
|
from_polars
classmethod
from_polars(df: DataFrame) -> MapExampleCollection
Create collection from a Polars DataFrame. Must have an 'output' column and at least one input column.
Source code in src/fenic/core/types/semantic_examples.py
201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 |
|
OperatorMetrics
dataclass
OperatorMetrics(operator_id: str, num_output_rows: int = 0, execution_time_ms: float = 0.0, lm_metrics: LMMetrics = LMMetrics(), rm_metrics: RMMetrics = RMMetrics(), is_cache_hit: bool = False)
Metrics for a single operator in the query execution plan.
Attributes:
-
operator_id
(str
) –Unique identifier for the operator
-
num_output_rows
(int
) –Number of rows output by this operator
-
execution_time_ms
(float
) –Execution time in milliseconds
-
lm_metrics
(LMMetrics
) –Language model usage metrics for this operator
-
is_cache_hit
(bool
) –Whether results were retrieved from cache
PredicateExample
Bases: BaseModel
A single semantic example for semantic predicate operations.
Predicate examples demonstrate the evaluation of input variables against a specific condition, used in a semantic.predicate operation.
PredicateExampleCollection
PredicateExampleCollection(examples: List[ExampleType] = None)
Bases: BaseExampleCollection[PredicateExample]
Collection of examples for semantic predicate operations.
Predicate operations evaluate conditions on input variables to produce boolean (True/False) results. This collection manages examples that demonstrate the expected boolean outcomes for different inputs.
Examples in this collection can have multiple input variables, each mapped to their respective values, with a single boolean output representing the evaluation result of the predicate.
Methods:
-
from_polars
–Create collection from a Polars DataFrame.
Source code in src/fenic/core/types/semantic_examples.py
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 |
|
from_polars
classmethod
from_polars(df: DataFrame) -> PredicateExampleCollection
Create collection from a Polars DataFrame.
Source code in src/fenic/core/types/semantic_examples.py
366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 |
|
QueryMetrics
dataclass
QueryMetrics(execution_time_ms: float = 0.0, num_output_rows: int = 0, total_lm_metrics: LMMetrics = LMMetrics(), total_rm_metrics: RMMetrics = RMMetrics(), _operator_metrics: Dict[str, OperatorMetrics] = dict(), _plan_repr: PhysicalPlanRepr = lambda: PhysicalPlanRepr(operator_id='empty')())
Comprehensive metrics for an executed query.
Includes overall statistics and detailed metrics for each operator in the execution plan.
Attributes:
-
execution_time_ms
(float
) –Total query execution time in milliseconds
-
num_output_rows
(int
) –Total number of rows returned by the query
-
total_lm_metrics
(LMMetrics
) –Aggregated language model metrics across all operators
Methods:
-
get_execution_plan_details
–Generate a formatted execution plan with detailed metrics.
-
get_summary
–Summarize the query metrics in a single line.
get_execution_plan_details
get_execution_plan_details() -> str
Generate a formatted execution plan with detailed metrics.
Produces a hierarchical representation of the query execution plan, including performance metrics and language model usage for each operator.
Returns:
-
str
(str
) –A formatted string showing the execution plan with metrics.
Source code in src/fenic/core/metrics.py
140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 |
|
get_summary
get_summary() -> str
Summarize the query metrics in a single line.
Returns:
-
str
(str
) –A concise summary of execution time, row count, and LM cost.
Source code in src/fenic/core/metrics.py
127 128 129 130 131 132 133 134 135 136 137 138 |
|
QueryResult
dataclass
QueryResult(data: DataLike, metrics: QueryMetrics)
Container for query execution results and associated metadata.
This dataclass bundles together the materialized data from a query execution along with metrics about the execution process. It provides a unified interface for accessing both the computed results and performance information.
Attributes:
-
data
(DataLike
) –The materialized query results in the requested format. Can be any of the supported data types (Polars/Pandas DataFrame, Arrow Table, or Python dict/list structures).
-
metrics
(QueryMetrics
) –Execution metadata including timing information, memory usage, rows processed, and other performance metrics collected during query execution.
Access query results and metrics
# Execute query and get results with metrics
result = df.filter(col("age") > 25).collect("pandas")
pandas_df = result.data # Access the Pandas DataFrame
print(result.metrics.execution_time) # Access execution metrics
print(result.metrics.rows_processed) # Access row count
Work with different data formats
# Get results in different formats
polars_result = df.collect("polars")
arrow_result = df.collect("arrow")
dict_result = df.collect("pydict")
# All contain the same data, different formats
print(type(polars_result.data)) # <class 'polars.DataFrame'>
print(type(arrow_result.data)) # <class 'pyarrow.lib.Table'>
print(type(dict_result.data)) # <class 'dict'>
Note
The actual type of the data
attribute depends on the format requested
during collection. Use type checking or isinstance() if you need to
handle the data differently based on its format.
RMMetrics
dataclass
RMMetrics(num_input_tokens: int = 0, num_requests: int = 0, cost: float = 0.0)
Tracks embedding model usage metrics including token counts and costs.
Attributes:
-
num_input_tokens
(int
) –Number of tokens to embed
-
cost
(float
) –Total cost in USD to embed the tokens
Schema
Represents the schema of a DataFrame.
A Schema defines the structure of a DataFrame by specifying an ordered collection of column fields. Each column field defines the name and data type of a column in the DataFrame.
Attributes:
-
column_fields
(List[ColumnField]
) –An ordered list of ColumnField objects that define the structure of the DataFrame.
Methods:
-
column_names
–Get a list of all column names in the schema.
column_names
column_names() -> List[str]
Get a list of all column names in the schema.
Returns:
-
List[str]
–A list of strings containing the names of all columns in the schema.
Source code in src/fenic/core/types/schema.py
62 63 64 65 66 67 68 |
|
StructField
A field in a StructType. Fields are nullable.
Attributes:
-
name
(str
) –The name of the field.
-
data_type
(DataType
) –The data type of the field.
StructType
Bases: DataType
A type representing a struct (record) with named fields.
Attributes:
-
fields
–List of field definitions.
Create a struct with name and age fields
StructType([
StructField("name", StringType),
StructField("age", IntegerType),
])
TranscriptType
Bases: _StringBackedType
Represents a string containing a transcript in a specific format.