Skip to content

Semantic Joins with Fenic

View in Github

This example demonstrates how to use Fenic's semantic joins to perform LLM-powered data matching based on natural language reasoning rather than exact equality or similarity scores.

Overview

Semantic joins enable you to join DataFrames using natural language predicates that are evaluated by language models. Unlike traditional joins that require exact matches or embedding-based similarity joins, semantic joins can understand complex relationships and make intelligent connections based on meaning and context.

This example showcases two practical use cases:

  • Content Recommendation: Matching user interests to relevant articles
  • Product Recommendations: Suggesting complementary products based on purchase history

Key Features Demonstrated

  • Natural Language Predicates: Using human-readable join conditions
  • LLM-Powered Reasoning: Leveraging GPT models for intelligent matching
  • Cross-Domain Understanding: Connecting concepts across different contexts
  • Zero-Shot Matching: No training data or examples required

How Semantic Joins Work

Basic Syntax

left_df.semantic.join(
    right_df,
    join_instruction="Natural language predicate with {column:left} and {column:right}"
)

Join Instruction Format

  • Must reference exactly two columns: one from each DataFrame
  • Use :left and :right suffixes to indicate which DataFrame each column comes from
  • Written as a boolean predicate that the LLM evaluates as True/False
  • Should be clear and unambiguous for consistent results

Example 1: Content Recommendation

Data Setup

User Profiles:

  • Sarah: "I love cooking Italian food and trying new pasta recipes"
  • Mike: "I enjoy working on cars and fixing engines in my spare time"
  • Emily: "Gardening is my passion, especially growing vegetables and flowers"
  • David: "I'm interested in learning about car maintenance and automotive repair"

Articles:

  • Cooking Pasta Recipes
  • Car Engine Maintenance
  • Gardening for Beginners
  • Advanced Automotive Repair

Semantic Join Implementation

users_df.semantic.join(
    articles_df,
    join_instruction="A person with interests '{interests:left}' would be interested in reading about '{description:right}'"
)

Matching Results

  • Sarah → Cooking Pasta Recipes ✅
  • Mike → Car Engine Maintenance + Advanced Automotive Repair ✅
  • Emily → Gardening for Beginners ✅
  • David → Car Engine Maintenance + Advanced Automotive Repair ✅

Example 2: Product Recommendations

Sample Data

Customer Purchases:

  • Alice: Professional DSLR Camera
  • Bob: Gaming Laptop
  • Carol: Yoga Mat
  • Dan: Coffee Maker

Product Catalog:

  • Camera Lens Kit, Tripod Stand (Photography)
  • Gaming Mouse, Mechanical Keyboard (Gaming)
  • Yoga Blocks, Exercise Resistance Bands (Fitness)
  • Coffee Beans, French Press (Food & Beverage)

Recommendation Logic

purchases_df.semantic.join(
    products_df,
    join_instruction="A customer who bought '{purchased_product:left}' would also be interested in '{product_name:right}'"
)

Recommendation Results

  • Alice (DSLR Camera) → Camera Lens Kit + Tripod Stand ✅
  • Bob (Gaming Laptop) → Gaming Mouse + Mechanical Keyboard ✅
  • Carol (Yoga Mat) → Yoga Blocks + Exercise Resistance Bands ✅
  • Dan (Coffee Maker) → Coffee Beans + French Press ✅

Technical Details

Session Configuration

config = fc.SessionConfig(
    app_name="semantic_joins",
    semantic=fc.SemanticConfig(
        language_models={
            "mini": fc.OpenAIModelConfig(
                model_name="gpt-4o-mini",
                rpm=500,
                tpm=200_000,
            )
        }
    ),
)

Performance Characteristics

  • Complexity: O(m × n) where m and n are the sizes of the DataFrames
  • LLM Calls: One API call per potential row pair
  • Rate Limiting: Respects RPM/TPM limits configured in session
  • Batching: Efficiently batches requests to optimize API usage

When to Use Semantic Joins

Ideal Use Cases:

  • Content personalization and recommendation systems
  • Product cross-selling and upselling
  • Skill-job matching in recruitment
  • Entity resolution across different data sources
  • Question-answer pairing for knowledge bases
  • Customer-service matching based on needs

Advantages:

  • No training data required (zero-shot)
  • Handles complex reasoning and context
  • Understands domain-specific relationships
  • Works with natural language descriptions
  • Flexible and interpretable join conditions

Considerations:

  • Higher latency than traditional joins
  • API costs for LLM usage
  • Rate limiting for large datasets
  • Best for moderate-sized datasets (hundreds to low thousands of rows)

Usage

# Ensure you have OpenAI API key configured
export OPENAI_API_KEY="your-api-key"

# Run the semantic joins example
python semantic_joins.py

Expected Output

The script demonstrates both use cases with clear before/after data views:

  1. User-Article Matching: Shows how semantic understanding connects user interests to relevant content
  2. Product Recommendations: Demonstrates intelligent product relationship detection for cross-selling

Learning Outcomes

This example teaches:

  • How to construct effective natural language join predicates
  • When semantic joins are preferable to traditional or similarity-based joins
  • Practical applications in recommendation systems and personalization
  • Understanding the trade-offs between accuracy, performance, and cost

Perfect for understanding how to leverage LLM reasoning capabilities for intelligent data joining scenarios that go beyond simple keyword matching or embedding similarity.