fenic: the dataframe (re)built for LLM inference

fenic is an opinionated, PySpark-inspired DataFrame framework from typedef.ai for building AI and agentic applications. Transform unstructured and structured data into insights using familiar DataFrame operations enhanced with semantic intelligence. With first-class support for markdown, transcripts, and semantic operators, plus efficient batch inference across any model provider.

Install

fenic requires supports Python [3.10, 3.11, 3.12]

pip install fenic

LLM Provider Setup

fenic requires an API key from at least one LLM provider. Set the appropriate environment variable for your chosen provider:

# For OpenAI
export OPENAI_API_KEY="your-openai-api-key"

# For Anthropic
export ANTHROPIC_API_KEY="your-anthropic-api-key"

# For Google
export GOOGLE_API_KEY="your-google-api-key"

# For Cohere
export COHERE_API_KEY="your-cohere-api-key"

Quickstart

The fastest way to learn about fenic is by checking the examples.

Below is a quick list of the examples in this repo:

Example	Description
Hello World!	Introduction to semantic extraction and classification using fenic's core operators through error log analysis.
Enrichment	Multi-stage DataFrames with template-based text extraction, joins, and LLM-powered transformations demonstrated via log enrichment.
Meeting Transcript Processing	Native transcript parsing, Pydantic schema integration, and complex aggregations shown through meeting analysis.
News Analysis	Analyze and extract insights from news articles using semantic operators and structured data processing.
Podcast Summarization	Process and summarize podcast transcripts with speaker-aware analysis and key point extraction.
Semantic Join	Instead of simple fuzzy matching, use fenic's powerful semantic join functionality to match data across tables.
Named Entity Recognition	Extract and classify named entities from text using semantic extraction and classification.
Markdown Processing	Process and transform markdown documents with structured data extraction and formatting.
JSON Processing	Handle complex JSON data structures with semantic operations and schema validation.
Feedback Clustering	Group and analyze feedback using semantic similarity and clustering operations.
Document Extraction	Extract structured information from various document formats using semantic operators.

(Feel free to click any example above to jump right to its folder.)

Why use fenic?

fenic is an opinionated, PySpark-inspired DataFrame framework for building production AI and agentic applications.

Unlike traditional data tools retrofitted for LLMs, fenic's query engine is built from the ground up with inference in mind.

Transform structured and unstructured data into insights using familiar DataFrame operations enhanced with semantic intelligence. With first-class support for markdown, transcripts, and semantic operators, plus efficient batch inference across any model provider.

fenic brings the reliability of traditional data pipelines to AI workloads.

Key Features

Purpose-Built for LLM Inference

Query engine designed from scratch for AI workloads, not retrofitted
Automatic batch optimization for API calls
Built-in retry logic and rate limiting
Token counting and cost tracking

Semantic Operators as First-Class Citizens

semantic.analyze_sentiment - Built-in sentiment analysis
semantic.classify - Categorize text with few-shot examples
semantic.extract - Transform unstructured text into structured data with schemas
semantic.group_by - Group data by semantic similarity
semantic.join - Join DataFrames on meaning, not just values
semantic.map - Apply natural language transformations
semantic.predicate - Create predicates using natural language to filter rows
semantic.reduce - Aggregate grouped data with LLM operations

Native Unstructured Data Support

Goes beyond typical multimodal data types (audio, images) by creating specialized types for text-heavy workloads:

Markdown parsing and extraction as a first-class data type
Transcript processing (SRT, generic formats) with speaker and timestamp awareness
JSON manipulation with JQ expressions for nested data
Automatic text chunking with configurable overlap for long documents

Production-Ready Infrastructure

Multi-provider support (OpenAI, Anthropic, Gemini)
Local and cloud execution backends
Comprehensive error handling and logging
Pydantic integration for type safety

Familiar DataFrame API

PySpark-compatible operations
Lazy evaluation and query optimization
SQL support for complex queries
Seamless integration with existing data pipelines

Why DataFrames for LLM and Agentic Applications?

AI and agentic applications are fundamentally pipelines and workflows - exactly what DataFrame APIs were designed to handle. Rather than reinventing patterns for data transformation, filtering, and aggregation, fenic leverages decades of proven engineering practices.

Decoupled Architecture for Better Agents

fenic creates a clear separation between heavy inference tasks and real-time agent interactions. By moving batch processing out of the agent runtime, you get:

More predictable and responsive agents
Better resource utilization with batched LLM calls
Cleaner separation between planning/orchestration and execution

Built for All Engineers

DataFrames aren't just for data practitioners. The fluent, composable API design makes it accessible to any engineer:

Chain operations naturally: df.filter(...).semantic.group_by(...)
Mix imperative and declarative styles seamlessly
Get started quickly with familiar patterns from pandas/PySpark or SQL

Support

Join our community on Discord where you can connect with other users, ask questions, and get help with your fenic projects. Our community is always happy to welcome newcomers!

If you find fenic useful, consider giving us a ⭐ at the top of our repository. Your support helps us grow and improve the framework for everyone!

Contributing

We welcome contributions of all kinds! Whether you're interested in writing code, improving documentation, testing features, or proposing new ideas, your help is valuable to us.

For developers planning to submit code changes, we encourage you to first open an issue to discuss your ideas before creating a Pull Request. This helps ensure alignment with the project's direction and prevents duplicate efforts.

Please refer to our contribution guidelines for detailed information about the development process and project setup.