
fenic: the dataframe (re)built for LLM inference
fenic is an opinionated, PySpark-inspired DataFrame framework from typedef.ai for building AI and agentic applications. Transform unstructured and structured data into insights using familiar DataFrame operations enhanced with semantic intelligence. With first-class support for markdown, transcripts, and semantic operators, plus efficient batch inference across any model provider.
Install
fenic requires supports Python [3.10, 3.11, 3.12]
pip install fenic
LLM Provider Setup
fenic requires an API key from at least one LLM provider. Set the appropriate environment variable for your chosen provider:
# For OpenAI
export OPENAI_API_KEY="your-openai-api-key"
# For Anthropic
export ANTHROPIC_API_KEY="your-anthropic-api-key"
# For Google
export GEMINI_API_KEY="your-google-api-key"
Quickstart
The fastest way to learn about fenic is by checking the examples.
Below is a quick list of the examples in this repo:
Example | Description |
---|---|
Hello World! | Introduction to semantic extraction and classification using fenic's core operators through error log analysis. |
Enrichment | Multi-stage DataFrames with template-based text extraction, joins, and LLM-powered transformations demonstrated via log enrichment. |
Meeting Transcript Processing | Native transcript parsing, Pydantic schema integration, and complex aggregations shown through meeting analysis. |
News Analysis | Analyze and extract insights from news articles using semantic operators and structured data processing. |
Podcast Summarization | Process and summarize podcast transcripts with speaker-aware analysis and key point extraction. |
Semantic Join | Instead of simple fuzzy matching, use fenic's powerful semantic join functionality to match data across tables. |
Named Entity Recognition | Extract and classify named entities from text using semantic extraction and classification. |
Markdown Processing | Process and transform markdown documents with structured data extraction and formatting. |
JSON Processing | Handle complex JSON data structures with semantic operations and schema validation. |
Feedback Clustering | Group and analyze feedback using semantic similarity and clustering operations. |
Document Extraction | Extract structured information from various document formats using semantic operators. |
(Feel free to click any example above to jump right to its folder.)
Why use fenic?
fenic is an opinionated, PySpark-inspired DataFrame framework for building production AI and agentic applications.
Unlike traditional data tools retrofitted for LLMs, fenic's query engine is built from the ground up with inference in mind.
Transform structured and unstructured data into insights using familiar DataFrame operations enhanced with semantic intelligence. With first-class support for markdown, transcripts, and semantic operators, plus efficient batch inference across any model provider.
fenic brings the reliability of traditional data pipelines to AI workloads.
Key Features
Purpose-Built for LLM Inference
- Query engine designed from scratch for AI workloads, not retrofitted
- Automatic batch optimization for API calls
- Built-in retry logic and rate limiting
- Token counting and cost tracking
Semantic Operators as First-Class Citizens
semantic.analyze_sentiment
- Built-in sentiment analysissemantic.classify
- Categorize text with few-shot examplessemantic.extract
- Transform unstructured text into structured data with schemassemantic.group_by
- Group data by semantic similaritysemantic.join
- Join DataFrames on meaning, not just valuessemantic.map
- Apply natural language transformationssemantic.predicate
- Create predicates using natural language to filter rowssemantic.reduce
- Aggregate grouped data with LLM operations
Native Unstructured Data Support
Goes beyond typical multimodal data types (audio, images) by creating specialized types for text-heavy workloads:
- Markdown parsing and extraction as a first-class data type
- Transcript processing (SRT, generic formats) with speaker and timestamp awareness
- JSON manipulation with JQ expressions for nested data
- Automatic text chunking with configurable overlap for long documents
Production-Ready Infrastructure
- Multi-provider support (OpenAI, Anthropic, Gemini)
- Local and cloud execution backends
- Comprehensive error handling and logging
- Pydantic integration for type safety
Familiar DataFrame API
- PySpark-compatible operations
- Lazy evaluation and query optimization
- SQL support for complex queries
- Seamless integration with existing data pipelines
Why DataFrames for LLM and Agentic Applications?
AI and agentic applications are fundamentally pipelines and workflows - exactly what DataFrame APIs were designed to handle. Rather than reinventing patterns for data transformation, filtering, and aggregation, fenic leverages decades of proven engineering practices.
Decoupled Architecture for Better Agents
fenic creates a clear separation between heavy inference tasks and real-time agent interactions. By moving batch processing out of the agent runtime, you get:
- More predictable and responsive agents
- Better resource utilization with batched LLM calls
- Cleaner separation between planning/orchestration and execution
Built for All Engineers
DataFrames aren't just for data practitioners. The fluent, composable API design makes it accessible to any engineer:
- Chain operations naturally:
df.filter(...).semantic.group_by(...)
- Mix imperative and declarative styles seamlessly
- Get started quickly with familiar patterns from pandas/PySpark or SQL
Support
Join our community on Discord where you can connect with other users, ask questions, and get help with your fenic projects. Our community is always happy to welcome newcomers!
If you find fenic useful, consider giving us a ⭐ at the top of our repository. Your support helps us grow and improve the framework for everyone!
Contributing
We welcome contributions of all kinds! Whether you're interested in writing code, improving documentation, testing features, or proposing new ideas, your help is valuable to us.
For developers planning to submit code changes, we encourage you to first open an issue to discuss your ideas before creating a Pull Request. This helps ensure alignment with the project's direction and prevents duplicate efforts.
Please refer to our contribution guidelines for detailed information about the development process and project setup.