A framework for building high-quality, verifiable evaluation datasets for LLMs.

Project description

Kushim: A Framework for Verifiable, Self-Optimizing LLM Evaluation Datasets

Kushim is a framework for generating high-quality, verifiable Question & Answer datasets. In an era of generative models, creating reliable evaluation data is one of the biggest challenges. Kushim addresses this by providing an end-to-end workflow built on two core principles: verifiability by design and self-optimizing quality.

It's not just about generating data; it's about generating trustworthy data that gets better on its own.

Kushim Illustration

The Kushim Philosophy: Core Concepts

1. Verifiable by Design

The biggest risk with synthetic data is factual inconsistency. Kushim is built to mitigate this risk. Every single question-answer pair generated by the pipeline is subjected to a strict validation step. An LLM-based validator checks if the generated answer is factually and unambiguously supported by the original source text. If a pair fails this check, it's discarded.

This ensures that your final dataset isn't just a collection of plausible-sounding questions, but a set of verifiable facts grounded in a source of truth.

2. Self-Optimizing Quality with DSPy

A static, one-size-fits-all prompt is not optimal. The best way to phrase a question depends on the source material. Kushim leverages the power of DSPy to create a self-improving pipeline.

Instead of just running a prompt, Kushim can "compile" it. It uses DSPy's optimizers (teleprompters) to:

Generate a small training set from your source documents.
Test multiple variations of prompts to see which ones produce the highest-quality, most verifiable Q&A pairs for your specific data.
Save this "compiled" program, which contains the optimized, high-performance prompts.

This means Kushim learns from your data to improve its own performance, leading to a significantly higher-quality final dataset.

How It Works: The Kushim Pipeline Workflow

The Kushim pipeline integrates these concepts into an efficient, streaming workflow that proceeds in the following stages:

Source & Fetch: The process begins by fetching raw documents from a designated Source, such as a Wikipedia article or a local file directory.
Chunking: The fetched documents are broken down into smaller, manageable text chunks. This is a standard practice in RAG-style pipelines and prepares the data for the generation models.
Self-Optimization (A One-Time "Compile" Step): This is the heart of Kushim's quality assurance process. Instead of using a static prompt, the pipeline:
- Generates a Training Set: It takes a small sample of the chunks to create a temporary training set.
- Optimizes Prompts: It uses DSPy to test multiple prompt variations, identifying the one that produces the highest-quality, most verifiable Q&A pairs for your specific data.
- This "compiled" program, containing the optimized prompts, is saved and used for the main generation task.
Generation: Using the high-performance prompts from the compilation step, the pipeline generates question-answer pairs from all of the text chunks.
Validation & Filtering: Each generated Q&A pair is rigorously validated. An LLM checks if the answer is factually supported by its original source chunk. Pairs that pass validation proceed to the final dataset; those that fail are discarded.

This multi-stage process ensures that the final output is not only relevant but also verifiable and of the highest possible quality.

Getting Started

After installing Kushim (uv add kushim or pip install kushim), you can use its core components directly. The key is to instantiate the pipeline and run it. The optimization is handled for you—the first run compiles and saves the best prompts, and subsequent runs are fast.

# A conceptual example of using the Kushim pipeline
from kushim import pipeline, config, source

# 1. Choose your source and model
data_source = source.WikipediaSource()
pipeline_config = config.KushimConfig(
    model_name="openrouter/openai/gpt-4.1",
    fetch_kwargs={"mode": "search", "query": "History of coffee"}
)

# 2. Instantiate the pipeline
kushim_pipeline = pipeline.KushimPipeline(
    source=data_source,
    config=pipeline_config
)

# 3. Run it!
# This will automatically handle compiling and saving the optimized
# generator to a .json file for you on the first run.
validated_dataset, _ = kushim_pipeline.run(
    optimize=True,
    compiled_generator_path="compiled_coffee_generator.json"
)

print(validated_dataset)

For complete, runnable scripts demonstrating the full dataset creation lifecycle (merging, encryption, and pushing to the Hugging Face Hub), please see the examples/ directory in the GitHub repository.

Project details

Release history Release notifications | RSS feed

This version

0.0.3

Jun 19, 2025

0.0.2

Jun 15, 2025

0.0.1

Jun 14, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kushim-0.0.3.tar.gz (255.0 kB view details)

Uploaded Jun 19, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

kushim-0.0.3-py3-none-any.whl (25.9 kB view details)

Uploaded Jun 19, 2025 Python 3

File details

Details for the file kushim-0.0.3.tar.gz.

File metadata

Download URL: kushim-0.0.3.tar.gz
Upload date: Jun 19, 2025
Size: 255.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.7.4

File hashes

Hashes for kushim-0.0.3.tar.gz
Algorithm	Hash digest
SHA256	`22d042fc93665e20f366344c05760b3c491754ebfa4ef03db14b01aeb013c49f`
MD5	`e880b4e787ba9883c63522924bdb08ff`
BLAKE2b-256	`11097e4eaa43b5bea78e5826ec35c3cd2af06ed38ba38f4040cf2c77533d5f6b`

See more details on using hashes here.

File details

Details for the file kushim-0.0.3-py3-none-any.whl.

File metadata

Download URL: kushim-0.0.3-py3-none-any.whl
Upload date: Jun 19, 2025
Size: 25.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.7.4

File hashes

Hashes for kushim-0.0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fd0a0646d84ea6e9d2074ebfec2ffa7d2d57448cd4149ef5f1fe56822b2a4af2`
MD5	`8f5db30a2141e8d2a5d1ee00304ac616`
BLAKE2b-256	`8b5e5c4d5f49303864ea816be03f2bde061dc3e55f6818d33199e0dc3fe59636`

See more details on using hashes here.

kushim 0.0.3

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Kushim: A Framework for Verifiable, Self-Optimizing LLM Evaluation Datasets

The Kushim Philosophy: Core Concepts

1. Verifiable by Design

2. Self-Optimizing Quality with DSPy

How It Works: The Kushim Pipeline Workflow

Getting Started

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes