Kura is a tool for analysing and visualising chat data

Project description

Kura: Procedural API for Chat Data Analysis

Kura Architecture

Your AI assistant handles thousands of conversations daily. But do you know what users actually need?

Kura is an open-source library for understanding chat data through machine learning, inspired by Anthropic's CLIO. It automatically clusters conversations to reveal patterns, pain points, and opportunities hidden in your data.

The Hidden Cost of Not Understanding Your Users

Every day, your AI assistant or chatbot has thousands of conversations. Within this data lies critical intelligence:

80% of support tickets might stem from the same 5 unclear features
Key feature requests repeated by hundreds of users in different ways
Revenue opportunities from unmet needs you didn't know existed
Critical failures affecting user trust that go unreported

Manually reviewing conversations doesn't scale. Traditional analytics miss semantic meaning. Kura bridges this gap.

What Kura Does

Kura transforms unstructured conversation data into structured insights:

10,000 conversations → AI Analysis → 20 clear patterns

Automatic Intent Discovery: Find what users actually want (not what they say)
Failure Pattern Detection: Identify where your AI falls short before users complain
Feature Priority Insights: See which missing features impact the most users
Semantic Clustering: Group by meaning, not keywords
Privacy-First Design: Analyze patterns without exposing individual conversations

Real-World Impact

E-commerce Support Bot

Challenge: 50,000 weekly conversations, unknown pain points Discovery: 35% of conversations about shipping clustered into 3 issues Result: Fixed root causes, reduced support volume by 40%

Developer Documentation Assistant

Challenge: Users struggling but not reporting specific issues Discovery: 2,000+ conversations revealed 5 consistently confusing APIs Result: Targeted doc improvements, 60% reduction in those queries

SaaS Onboarding Bot

Challenge: 30% of trials not converting, unclear why Discovery: Clustering revealed 3 missing integration requests Result: Built integrations, trial conversion increased 18%

Installation

uv pip install kura

When to Use Kura

Kura is perfect when you have:

100+ conversations to analyze (scales to millions)
A need to understand user patterns, not individual conversations
Unstructured conversation data from chatbots, support systems, or AI assistants
Questions like "What are users struggling with?" or "What features are they requesting?"

Kura might not be the best fit if:

You have fewer than 100 conversations (manual review might be faster)
You need real-time analysis (Kura is designed for batch processing)
You only need keyword-based search (use traditional search tools instead)
You require conversation-level sentiment analysis (Kura focuses on patterns)

Common Use Cases

Product Teams

Feature Discovery: Find the features users ask for in their own words
Pain Point Analysis: Identify friction in user journeys
Roadmap Prioritization: Quantify impact of potential improvements

Customer Success

Support Deflection: Find common issues to create better docs/FAQs
Escalation Patterns: Identify conversations that lead to churn
Success Patterns: Discover what makes users successful

AI/ML Teams

Prompt Engineering: Find where prompts fail or confuse users
Model Evaluation: Understand model performance beyond metrics
Training Data: Identify gaps in model knowledge

Analytics Teams

Behavioral Insights: Understand user segments by conversation patterns
Trend Analysis: Track how user needs evolve over time
ROI Measurement: Connect conversation patterns to business outcomes

Quick Start

From Zero to Insights in 5 Minutes

import asyncio
from rich.console import Console
from kura.cache import DiskCacheStrategy
from kura.summarisation import summarise_conversations, SummaryModel
from kura.cluster import generate_base_clusters_from_conversation_summaries, ClusterDescriptionModel
from kura.meta_cluster import reduce_clusters_from_base_clusters, MetaClusterModel
from kura.dimensionality import reduce_dimensionality_from_clusters, HDBUMAP
from kura.visualization import visualise_pipeline_results
from kura.types import Conversation
from kura.checkpoints import JSONLCheckpointManager


async def main():
    console = Console()

    # Define Models
    summary_model = SummaryModel(
        console=console,
        cache=DiskCacheStrategy(cache_dir="./.summary"),  # Uses disk-based caching
    )
    cluster_model = ClusterDescriptionModel(console=console)  # Uses K-means by default
    meta_cluster_model = MetaClusterModel(console=console)
    dimensionality_model = HDBUMAP()

    # Define Checkpoints
    checkpoint_manager = JSONLCheckpointManager("./checkpoints", enabled=True)

    # Load conversations from Hugging Face dataset
    conversations = Conversation.from_hf_dataset(
        "ivanleomk/synthetic-gemini-conversations", split="train"
    )

    # Process through the pipeline step by step
    summaries = await summarise_conversations(
        conversations, model=summary_model, checkpoint_manager=checkpoint_manager
    )

    clusters = await generate_base_clusters_from_conversation_summaries(
        summaries, model=cluster_model, checkpoint_manager=checkpoint_manager
    )

    reduced_clusters = await reduce_clusters_from_base_clusters(
        clusters, model=meta_cluster_model, checkpoint_manager=checkpoint_manager
    )

    projected_clusters = await reduce_dimensionality_from_clusters(
        reduced_clusters,
        model=dimensionality_model,
        checkpoint_manager=checkpoint_manager,
    )

    # Visualize results
    visualise_pipeline_results(projected_clusters, style="basic")


if __name__ == "__main__":
    asyncio.run(main())

What This Example Does

Loads 190 real programming conversations from Hugging Face
Summarizes each conversation into a concise task description (with caching!)
Clusters similar conversations using MiniBatch K-means for speed
Organizes clusters into a hierarchy for easy navigation
Visualizes the results in your terminal

Expected Output

Programming Assistance (190 conversations)
├── Data Analysis & Visualization (38 conversations)
│   ├── R Programming for statistical analysis (12 conversations)
│   ├── Tableau dashboard creation (10 conversations)
│   └── Python data manipulation with pandas (16 conversations)
├── Web Development (45 conversations)
│   ├── React component development (20 conversations)
│   ├── API integration issues (15 conversations)
│   └── CSS styling and responsive design (10 conversations)
├── Machine Learning (32 conversations)
│   ├── Model training with TensorFlow (18 conversations)
│   └── Data preprocessing challenges (14 conversations)
└── ... (more clusters)

Total processing time: 21.9s (2.1s with cache!)
Checkpoints saved to: ./checkpoints/

Development

See CONTRIBUTING.md for development setup, testing, and contribution guidelines.

License

MIT License

About

Kura is under active development. If you face any issues or have suggestions, please feel free to open an issue or a PR. For more details on the technical implementation, check out this walkthrough of the code.

Project details

Release history Release notifications | RSS feed

This version

0.5.4

Sep 9, 2025

0.5.2

Jun 26, 2025

0.5.1

Jun 18, 2025

0.5.0

May 29, 2025

0.4.4

Apr 11, 2025

0.4.3

Jan 19, 2025

0.4.2

Jan 14, 2025

0.4.0

Jan 14, 2025

0.2.0

Jan 10, 2025

0.1.0

Jan 10, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kura-0.5.4.tar.gz (2.6 MB view details)

Uploaded Sep 9, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

kura-0.5.4-py3-none-any.whl (330.8 kB view details)

Uploaded Sep 9, 2025 Python 3

File details

Details for the file kura-0.5.4.tar.gz.

File metadata

Download URL: kura-0.5.4.tar.gz
Upload date: Sep 9, 2025
Size: 2.6 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.8.15

File hashes

Hashes for kura-0.5.4.tar.gz
Algorithm	Hash digest
SHA256	`abb9415267539e0d9c476a43b65602402ed3d6a83b7912847891d41c1e541d2c`
MD5	`f46758ac3942844d8a26c404fab11868`
BLAKE2b-256	`b6d6ba025fc5ba404a84d684dc4bf49d769a9e7a608506caa9870e5c5800fe7f`

See more details on using hashes here.

File details

Details for the file kura-0.5.4-py3-none-any.whl.

File metadata

Download URL: kura-0.5.4-py3-none-any.whl
Upload date: Sep 9, 2025
Size: 330.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.8.15

File hashes

Hashes for kura-0.5.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c5b400ed3ce35bab6445c70282c1fcc155bcddf922d81c6184c235572b90a1d5`
MD5	`13f8a55fe6a0ebd11c17e8db7dcf9e62`
BLAKE2b-256	`b281e9a55c82051702b7f791d6a1715087a2f264eac19dc8f99e62bb8e501ce6`

See more details on using hashes here.

kura 0.5.4

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Kura: Procedural API for Chat Data Analysis

The Hidden Cost of Not Understanding Your Users

What Kura Does

Real-World Impact

E-commerce Support Bot

Developer Documentation Assistant

SaaS Onboarding Bot

Installation

When to Use Kura

Common Use Cases

Product Teams

Customer Success

AI/ML Teams

Analytics Teams

Quick Start

From Zero to Insights in 5 Minutes

What This Example Does

Expected Output

Development

License

About

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes