fenic.api.session
Session module for managing query execution context and state.
Classes:
-
AnthropicModelConfig
–Configuration for Anthropic models.
-
CloudConfig
–Configuration for cloud-based execution.
-
CloudExecutorSize
–Enum defining available cloud executor sizes.
-
GoogleGLAModelConfig
–Configuration for Google GenerativeLAnguage (GLA) models.
-
GoogleVertexModelConfig
–Configuration for Google Vertex models.
-
OpenAIModelConfig
–Configuration for OpenAI models.
-
SemanticConfig
–Configuration for semantic language and embedding models.
-
Session
–The entry point to programming with the DataFrame API. Similar to PySpark's SparkSession.
-
SessionConfig
–Configuration for a user session.
AnthropicModelConfig
Bases: BaseModel
Configuration for Anthropic models.
This class defines the configuration settings for Anthropic language models, including model selection and separate rate limiting parameters for input and output tokens.
Attributes:
-
model_name
(ANTHROPIC_AVAILABLE_LANGUAGE_MODELS
) –The name of the Anthropic model to use.
-
rpm
(int
) –Requests per minute limit; must be greater than 0.
-
input_tpm
(int
) –Input tokens per minute limit; must be greater than 0.
-
output_tpm
(int
) –Output tokens per minute limit; must be greater than 0.
Examples:
Configuring an Anthropic model with separate input/output rate limits:
config = AnthropicModelConfig(
model_name="claude-3-5-haiku-latest",
rpm=100,
input_tpm=100,
output_tpm=100
)
CloudConfig
Bases: BaseModel
Configuration for cloud-based execution.
This class defines settings for running operations in a cloud environment, allowing for scalable and distributed processing of language model operations.
Attributes:
-
size
(Optional[CloudExecutorSize]
) –Size of the cloud executor instance. If None, the default size will be used.
CloudExecutorSize
Bases: str
, Enum
Enum defining available cloud executor sizes.
This enum represents the different size options available for cloud-based execution environments.
Attributes:
-
SMALL
–Small instance size.
-
MEDIUM
–Medium instance size.
-
LARGE
–Large instance size.
-
XLARGE
–Extra large instance size.
GoogleGLAModelConfig
Bases: BaseModel
Configuration for Google GenerativeLAnguage (GLA) models.
This class defines the configuration settings for models available in Google Developer AI Studio, including model selection and rate limiting parameters. These models are accessible using a GEMINI_API_KEY environment variable.
GoogleVertexModelConfig
Bases: BaseModel
Configuration for Google Vertex models.
This class defines the configuration settings for models available in Google Vertex AI,
including model selection and rate limiting parameters. In order to use these models, you must have a
Google Cloud service account, or use the gcloud
cli tool to authenticate your local environment.
OpenAIModelConfig
Bases: BaseModel
Configuration for OpenAI models.
This class defines the configuration settings for OpenAI language and embedding models, including model selection and rate limiting parameters.
Attributes:
-
model_name
(Union[OPENAI_AVAILABLE_LANGUAGE_MODELS, OPENAI_AVAILABLE_EMBEDDING_MODELS]
) –The name of the OpenAI model to use.
-
rpm
(int
) –Requests per minute limit; must be greater than 0.
-
tpm
(int
) –Tokens per minute limit; must be greater than 0.
Examples:
Configuring an OpenAI Language model with rate limits:
config = OpenAIModelConfig(model_name="gpt-4.1-nano", rpm=100, tpm=100)
Configuring an OpenAI Embedding model with rate limits:
config = OpenAIModelConfig(model_name="text-embedding-3-small", rpm=100, tpm=100)
SemanticConfig
Bases: BaseModel
Configuration for semantic language and embedding models.
This class defines the configuration for both language models and optional embedding models used in semantic operations. It ensures that all configured models are valid and supported by their respective providers.
Attributes:
-
language_models
(dict[str, ModelConfig]
) –Mapping of model aliases to language model configurations.
-
default_language_model
(Optional[str]
) –The alias of the default language model to use for semantic operations. Not required if only one language model is configured.
-
embedding_models
(Optional[dict[str, ModelConfig]]
) –Optional mapping of model aliases to embedding model configurations.
-
default_embedding_model
(Optional[str]
) –The alias of the default embedding model to use for semantic operations.
Note
The embedding model is optional and only required for operations that need semantic search or embedding capabilities.
Methods:
-
model_post_init
–Post initialization hook to set defaults.
-
validate_models
–Validates that the selected models are supported by the system.
model_post_init
model_post_init(__context) -> None
Post initialization hook to set defaults.
This hook runs after the model is initialized and validated. It sets the default language and embedding models if they are not set and there is only one model available.
Source code in src/fenic/api/session/config.py
154 155 156 157 158 159 160 161 162 163 164 165 166 |
|
validate_models
validate_models() -> SemanticConfig
Validates that the selected models are supported by the system.
This validator checks that both the language model and embedding model (if provided) are valid and supported by their respective providers.
Returns:
-
SemanticConfig
–The validated SemanticConfig instance.
Raises:
-
ConfigurationError
–If any of the models are not supported.
Source code in src/fenic/api/session/config.py
168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 |
|
Session
The entry point to programming with the DataFrame API. Similar to PySpark's SparkSession.
Create a session with default configuration
session = Session.get_or_create(SessionConfig(app_name="my_app"))
Create a session with cloud configuration
config = SessionConfig(
app_name="my_app",
cloud=True,
api_key="your_api_key"
)
session = Session.get_or_create(config)
Methods:
-
create_dataframe
–Create a DataFrame from a variety of Python-native data formats.
-
get_or_create
–Gets an existing Session or creates a new one with the configured settings.
-
sql
–Execute a read-only SQL query against one or more DataFrames using named placeholders.
-
stop
–Stops the session and closes all connections.
-
table
–Returns the specified table as a DataFrame.
Attributes:
-
catalog
(Catalog
) –Interface for catalog operations on the Session.
-
read
(DataFrameReader
) –Returns a DataFrameReader that can be used to read data in as a DataFrame.
catalog
property
catalog: Catalog
Interface for catalog operations on the Session.
read
property
read: DataFrameReader
Returns a DataFrameReader that can be used to read data in as a DataFrame.
Returns:
-
DataFrameReader
(DataFrameReader
) –A reader interface to read data into DataFrame
Raises:
-
RuntimeError
–If the session has been stopped
create_dataframe
create_dataframe(data: DataLike) -> DataFrame
Create a DataFrame from a variety of Python-native data formats.
Parameters:
-
data
(DataLike
) –Input data. Must be one of: - Polars DataFrame - Pandas DataFrame - dict of column_name -> list of values - list of dicts (each dict representing a row) - pyarrow Table
Returns:
-
DataFrame
–A new DataFrame instance
Raises:
-
ValueError
–If the input format is unsupported or inconsistent with provided column names.
Create from Polars DataFrame
import polars as pl
df = pl.DataFrame({"col1": [1, 2], "col2": ["a", "b"]})
session.create_dataframe(df)
Create from Pandas DataFrame
import pandas as pd
df = pd.DataFrame({"col1": [1, 2], "col2": ["a", "b"]})
session.create_dataframe(df)
Create from dictionary
session.create_dataframe({"col1": [1, 2], "col2": ["a", "b"]})
Create from list of dictionaries
session.create_dataframe([
{"col1": 1, "col2": "a"},
{"col1": 2, "col2": "b"}
])
Create from pyarrow Table
import pyarrow as pa
table = pa.Table.from_pydict({"col1": [1, 2], "col2": ["a", "b"]})
session.create_dataframe(table)
Source code in src/fenic/api/session/session.py
131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 |
|
get_or_create
classmethod
get_or_create(config: SessionConfig) -> Session
Gets an existing Session or creates a new one with the configured settings.
Returns:
-
Session
–A Session instance configured with the provided settings
Source code in src/fenic/api/session/session.py
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 |
|
sql
sql(query: str, /, **tables: DataFrame) -> DataFrame
Execute a read-only SQL query against one or more DataFrames using named placeholders.
This allows you to execute ad hoc SQL queries using familiar syntax when it's more convenient than the DataFrame API.
Placeholders in the SQL string (e.g. {df}
) should correspond to keyword arguments (e.g. df=my_dataframe
).
For supported SQL syntax and functions, refer to the DuckDB SQL documentation: https://duckdb.org/docs/sql/introduction.
Parameters:
-
query
(str
) –A SQL query string with placeholders like
{df}
-
**tables
(DataFrame
, default:{}
) –Keyword arguments mapping placeholder names to DataFrames
Returns:
-
DataFrame
–A lazy DataFrame representing the result of the SQL query
Raises:
-
ValidationError
–If a placeholder is used in the query but not passed as a keyword argument
Simple join between two DataFrames
df1 = session.create_dataframe({"id": [1, 2]})
df2 = session.create_dataframe({"id": [2, 3]})
result = session.sql(
"SELECT * FROM {df1} JOIN {df2} USING (id)",
df1=df1,
df2=df2
)
Complex query with multiple DataFrames
users = session.create_dataframe({"user_id": [1, 2], "name": ["Alice", "Bob"]})
orders = session.create_dataframe({"order_id": [1, 2], "user_id": [1, 2]})
products = session.create_dataframe({"product_id": [1, 2], "name": ["Widget", "Gadget"]})
result = session.sql("""
SELECT u.name, p.name as product
FROM {users} u
JOIN {orders} o ON u.user_id = o.user_id
JOIN {products} p ON o.product_id = p.product_id
""", users=users, orders=orders, products=products)
Source code in src/fenic/api/session/session.py
243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 |
|
stop
stop()
Stops the session and closes all connections.
Source code in src/fenic/api/session/session.py
311 312 313 |
|
table
table(table_name: str) -> DataFrame
Returns the specified table as a DataFrame.
Parameters:
-
table_name
(str
) –Name of the table
Returns:
-
DataFrame
–Table as a DataFrame
Raises:
-
ValueError
–If the table does not exist
Load an existing table
df = session.table("my_table")
Source code in src/fenic/api/session/session.py
220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 |
|
SessionConfig
Bases: BaseModel
Configuration for a user session.
This class defines the complete configuration for a user session, including application settings, model configurations, and optional cloud settings. It serves as the central configuration object for all language model operations.
Attributes:
-
app_name
(str
) –Name of the application using this session. Defaults to "default_app".
-
db_path
(Optional[Path]
) –Optional path to a local database file for persistent storage.
-
semantic
(SemanticConfig
) –Configuration for semantic models (required).
-
cloud
(Optional[CloudConfig]
) –Optional configuration for cloud execution.
Note
The semantic configuration is required as it defines the language models that will be used for processing. The cloud configuration is optional and only needed for distributed processing.