Customer Feedback Clustering & Analysis with Fenic

View in Github

This example demonstrates how to use Fenic's semantic.with_cluster_labels() and semantic.reduce() to automatically cluster customer feedback into themes and generate intelligent summaries for each discovered category.

Overview

Customer feedback analysis is a critical business process that traditionally requires manual categorization and analysis. This example shows how semantic clustering can automatically:

Discover hidden themes in unstructured feedback without predefined categories
Group similar feedback based on semantic meaning rather than keywords
Generate actionable insights for each theme using AI-powered summarization
Prioritize issues based on sentiment and frequency

Key Features Demonstrated

Semantic Clustering: Using semantic.with_cluster_labels() for embedding-based clustering
AI Summarization: Using semantic.reduce() for intelligent theme analysis
Automatic Theme Discovery: No manual categorization required
Sentiment Analysis: Understanding positive vs negative feedback patterns
Business Intelligence: Actionable insights for product teams

How It Works

Step 1: Data Preparation

Load customer feedback with ratings and metadata:

feedback_data = [
    {
        "feedback_id": "fb_001",
        "customer_name": "Alice Johnson",
        "feedback": "The mobile app crashes every time I try to upload a photo. Very frustrating!",
        "rating": 1,
        "timestamp": "2024-01-15"
    },
    # ... more feedback
]

Step 2: Embedding Creation

Generate semantic embeddings from feedback text:

feedback_with_embeddings = feedback_df.select(
    "*",
    fc.semantic.embed(fc.col("feedback")).alias("feedback_embeddings")
)

Step 3: Semantic Clustering & Summarization

Use both operations together in a single aggregation:

feedback_clusters = feedback_with_embeddings.semantic.with_cluster_labels(
    fc.col("feedback_embeddings"),
    4  # Number of clusters - expecting themes like bugs, performance, features, praise
).group_by(
    "cluster_label"
).agg(
    fc.count("*").alias("feedback_count"),
    fc.avg("rating").alias("avg_rating"),
    fc.collect_list("customer_name").alias("customer_names"),
    fc.semantic.reduce(
        "Analyze this cluster of customer feedback and provide a concise summary of the main theme, common issues, and sentiment. Feedback: {feedback}"
    ).alias("theme_summary")
)

Sample Results

The system automatically discovered these themes from 12 feedback entries:

Cluster 0: Positive Features & Support (4.75★)

Theme: Praise for specific features and excellent customer support
Key Points: Dark mode feature, helpful support team, effective search functionality
Sentiment: Predominantly positive with some feature enhancement requests

Cluster 1: UI/UX Design Issues (2.0★)

Theme: Design consistency and professional appearance concerns
Key Points: Inconsistent button layouts across screens
Sentiment: Negative due to unprofessional user experience

Cluster 2: Technical Performance Problems (1.75★)

Theme: Critical technical issues affecting core functionality
Key Points: App crashes, slow loading times, frequent freezes
Sentiment: Very negative with high frustration levels

Cluster 3: Usability & Feature Gaps (2.0★)

Theme: Process complexity and missing functionality
Key Points: Confusing checkout, need for offline mode
Sentiment: Negative about functionality limitations

Value

Automated Insights

Identifies themes without manual categorization
Provides consistent analysis across all feedback
Scales to thousands of feedback entries

Actionable Intelligence

Priority 1: Fix technical crashes and performance (Cluster 2)
Priority 2: Improve design consistency (Cluster 1)
Priority 3: Simplify user workflows (Cluster 3)
Maintain: Continue excellent support and features (Cluster 0)

Resource Optimization

Reduces manual analysis time from hours to minutes
Enables real-time feedback monitoring
Focuses development efforts on highest-impact issues

Technical Architecture

Session Configuration

config = fc.SessionConfig(
    app_name="feedback_clustering",
    semantic=fc.SemanticConfig(
        language_models={
            "mini": fc.OpenAIModelConfig(
                model_name="gpt-4o-mini",
                rpm=500,
                tpm=200_000,
            )
        },
        embedding_models={
            "small": fc.OpenAIModelConfig(
                model_name="text-embedding-3-small",
                rpm=3000,
                tpm=1_000_000
            )
        }
    ),
)

Key Operations

semantic.with_cluster_label(embedding_column, num_clusters)

Uses K-means clustering on embedding vectors
Assigns cluster_label to each row

semantic.reduce(instruction)

Aggregation function that summarizes multiple texts
Uses LLM to analyze and synthesize insights
Generates human-readable theme descriptions

Usage

# Ensure you have OpenAI API key configured
export OPENAI_API_KEY="your-api-key"

# Run the feedback clustering analysis
python feedback_clustering.py

Expected Output

The script shows:

Raw Feedback Data: Customer names, feedback text, and ratings
Clustering Progress: Embedding generation and clustering status
Theme Analysis: Detailed summaries for each discovered cluster
Business Insights: Actionable themes ranked by priority

Use Cases

Product Development

Identify most requested features
Understand user pain points
Prioritize bug fixes and improvements

Customer Success

Monitor satisfaction trends
Identify at-risk customer segments
Improve support processes

Marketing Intelligence

Understand customer sentiment
Identify product strengths for messaging
Track competitive advantages

Learning Outcomes

This example teaches:

How to combine embedding-based clustering with AI summarization
When to use semantic operations for business intelligence
Patterns for automated text analysis and insight generation
Integration of multiple semantic operations in data pipelines