chunkbench-rag

A chunk-source-agnostic evaluation harness for RAG chunking strategies

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

ghassenov

These details have not been verified by PyPI

Project description

chunkbench

Python 3.12 | 3.13 mypy --strict

Every RAG tutorial picks a chunk size, shrugs, and moves on. chunkbench is what happens after you stop shrugging.

The problem, in one sentence

You split your documents into chunks somehow — fixed size, paragraphs, a chunking library's default recipe, vibes — and that one decision quietly determines whether your retriever can ever find the right answer. Most teams never measure it. They just ship the first thing that seemed to work on three test queries and hope.

chunkbench replaces the hoping with a number. Feed it a corpus, a handful of chunking strategies, and a set of real questions with known-correct answers, and it tells you — with recall, precision, and cost figures side by side — which strategy actually retrieves the right information, instead of which one merely feels right.

It does not chunk your documents. It does not pick your embedding model. It does not talk you into using its favorite LLM. Those are your calls, made with your tools — chunkbench just tells you, honestly, whether the call you made was any good. Think of it less as a library and more as the friend who actually reads the whole receipt before saying "yeah, that seems fair."

Full design rationale — why golden questions live at the section level, what each metric actually measures, and the exact list of things chunkbench deliberately refuses to do — lives in docs/chunkbench.md.

Install

pip install chunkbench-rag

(The PyPI distribution is chunkbench-rag — chunkbench alone was too close to an existing project's name — but the import and the CLI command are both still plain chunkbench.)

Core install is three dependencies deep (pydantic, pyyaml, numpy) — no embedding SDK, no LLM SDK, no chunking library, because chunkbench isn't going to make that decision for you. The one shipped convenience extra:

pip install chunkbench-rag[openai]   # adds chunkbench.embedding.providers.openai
                                      # and chunkbench.generation.providers.openai

Using something else — chonkie, Gemini, Cohere, a model you trained in your garage — see Bring your own everything below. No extra required; it's a ~15-line function either way.

60-second quickstart

from chunkbench import run_comparison
from chunkbench.corpus import directory_corpus_loader

report = run_comparison(
    corpus=directory_corpus_loader("examples/quickstart/corpus", extensions=(".md",)),
    embedder=toy_embedder,               # any Embedder — see below
    golden_set="examples/quickstart/golden_qa.yaml",
    chunk_sources={
        "whole_section": whole_section_chunker,
        "paragraph": paragraph_chunker,
    },
    k=2,
)

report.to_markdown("report.md")
report.to_json("report.json")

toy_embedder, whole_section_chunker, and paragraph_chunker are tiny example functions in examples/quickstart/quickstart.py — this exact snippet runs today, unmodified, no API key, no network call:

python examples/quickstart/quickstart.py

whole_section: recall@2=1.00
paragraph: recall@2=1.00
Wrote examples/quickstart/report.md and examples/quickstart/report.json

The embedder there is a dependency-free hashing stand-in, good for proving the plumbing works and not much else. Swap it for something real before trusting the numbers.

Bring your own everything

There is exactly one base class in chunkbench you're required to inherit from: none. ChunkSource, Embedder, Generator, and Judge are all plain function shapes (Callable[...]) — wrap whatever you already use and hand it over.

Chunking, with chonkie:

from chonkie import RecursiveChunker
from chunkbench import Chunk, Document

def chonkie_chunker(document: Document) -> list[Chunk]:
    chunker = RecursiveChunker()
    chunks = []
    for slug, section_text in _sections(document.content):   # your own section splitter
        for i, piece in enumerate(chunker(section_text)):
            chunks.append(Chunk(
                id=f"{document.id}-{slug}-{i}", doc_id=document.id,
                section=slug, text=piece.text,
            ))
    return chunks

Embedding and generation, with Gemini 2.5 Flash:

from google import genai
from chunkbench import Embedder, Vector

def gemini_embedder(model: str = "gemini-embedding-001") -> Embedder:
    client = genai.Client()
    def embed(texts: list[str]) -> list[Vector]:
        return [e.values for e in client.models.embed_content(model=model, contents=texts).embeddings]
    return embed

from chunkbench import run_comparison

report = run_comparison(
    corpus=my_corpus_loader,
    embedder=gemini_embedder(),
    chunk_sources={"chonkie_recursive": chonkie_chunker},
    golden_set="golden_qa.yaml",
    k=5,
)

Neither chonkie nor google-genai is a chunkbench dependency — install what you need yourself. Full runnable versions, plus the same pattern applied to a judge model, live in docs/providers.md and examples/providers/. Swap in Cohere, Voyage, sentence-transformers, or an in-house model gateway the same way — chunkbench genuinely does not care.

The composable API

For finer control — running only part of the pipeline, or scoring a custom metric:

from chunkbench import Pipeline, registry

@registry.metric("my_custom_metric")
class MyMetric:
    def score(self, retrieved, golden) -> float:
        ...

pipeline = Pipeline(embedder=my_embed_function, golden_set=my_golden_set)
chunks = pipeline.run_chunking(corpus, chunk_source=my_semantic_chunker)
results = pipeline.run_retrieval(chunks, k=5)
scores = pipeline.score(results, metrics=["recall", "precision", "my_custom_metric"])

docs/api-stability.md names exactly which extension points (chunk-source contract, metric registry, embedder/vector-store interfaces) carry a semver stability guarantee — the short version: the things listed above, forever; the internals, whenever we find a better way.

CLI

# Config-file-driven — see docs/chunkbench.md for the full schema.
chunkbench run --config chunkbench.yaml

# Flag-driven, for one-off use. --chunkers/--embedder/--generator/--judge
# all take 'module:attribute' import strings — chunkbench doesn't ship
# chunking algorithms or provider integrations, so these point at your
# own code, importable from wherever you run the command.
chunkbench run \
  --corpus ./docs \
  --golden golden_qa.yaml \
  --chunkers whole_section=mypkg.chunkers:whole_section,semantic=mypkg.chunkers:semantic \
  --embedder mypkg.providers:gemini_embedder \
  --k 5

# Re-render a previous run's results.json in another format.
chunkbench report --from results.json --format html

A regression_gate section in the config file makes chunkbench run exit non-zero when a metric drops below a threshold ("fail if recall_at_k for semantic drops below 0.8") — drop it into CI as a quality gate on chunking changes instead of finding out in production.

What you get back

A Report, in three flavors: a Python object (iterable/indexable per approach and per question), Markdown (drop into a PR description), and JSON (the stable integration point — schema pinned in docs/report-schema.json).

Documentation

docs/chunkbench.md — the full design doc: core idea, what's measured and why, pipeline stages, what chunkbench deliberately doesn't do.
docs/providers.md — wiring in chonkie, Gemini, or any other chunking/embedding/LLM provider.
docs/api-stability.md — which extension points carry a semver guarantee.
docs/report-schema.json — JSON Schema for results.json.
CONTRIBUTING.md — dev setup, checks, and code style.
CHANGELOG.md — release history.

License

MIT — see LICENSE.

Project details

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

ghassenov

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Jul 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chunkbench_rag-0.1.0.tar.gz (147.3 kB view details)

Uploaded Jul 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

chunkbench_rag-0.1.0-py3-none-any.whl (49.4 kB view details)

Uploaded Jul 3, 2026 Python 3

File details

Details for the file chunkbench_rag-0.1.0.tar.gz.

File metadata

Download URL: chunkbench_rag-0.1.0.tar.gz
Upload date: Jul 3, 2026
Size: 147.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for chunkbench_rag-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`24c5d90cd91fe9176d0a005f1fc4d148132d0565aa4cb1b92abdecc634f2c3d6`
MD5	`8423663954d95df9f7e75711bec8ec68`
BLAKE2b-256	`61da8f774720c57da4be5308cd18198f221c5125cafc1c9ccd38be19308765c0`

See more details on using hashes here.

Provenance

The following attestation bundles were made for chunkbench_rag-0.1.0.tar.gz:

Publisher: release.yml on ghassenov/chunkbench

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: chunkbench_rag-0.1.0.tar.gz
- Subject digest: 24c5d90cd91fe9176d0a005f1fc4d148132d0565aa4cb1b92abdecc634f2c3d6
- Sigstore transparency entry: 2060616455
- Sigstore integration time: Jul 3, 2026
Source repository:
- Permalink: ghassenov/chunkbench@8c75e1d7587b41c82b094e85add43ecc8626e810
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/ghassenov
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@8c75e1d7587b41c82b094e85add43ecc8626e810
- Trigger Event: push

File details

Details for the file chunkbench_rag-0.1.0-py3-none-any.whl.

File metadata

Download URL: chunkbench_rag-0.1.0-py3-none-any.whl
Upload date: Jul 3, 2026
Size: 49.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for chunkbench_rag-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`25826f13f51d99bafb38089bcef87a508f3685b0169a29b91839045687553bbb`
MD5	`049c4aa40deb8b7c325bd031f8411525`
BLAKE2b-256	`76102514f8d2abb9bece65eaa27a1ecb054552af273e8116a22f7e582ec64687`

See more details on using hashes here.

Provenance

The following attestation bundles were made for chunkbench_rag-0.1.0-py3-none-any.whl:

Publisher: release.yml on ghassenov/chunkbench

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: chunkbench_rag-0.1.0-py3-none-any.whl
- Subject digest: 25826f13f51d99bafb38089bcef87a508f3685b0169a29b91839045687553bbb
- Sigstore transparency entry: 2060616855
- Sigstore integration time: Jul 3, 2026
Source repository:
- Permalink: ghassenov/chunkbench@8c75e1d7587b41c82b094e85add43ecc8626e810
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/ghassenov
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@8c75e1d7587b41c82b094e85add43ecc8626e810
- Trigger Event: push

chunkbench-rag 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

The problem, in one sentence

Install

60-second quickstart

Bring your own everything

The composable API

CLI

What you get back

Documentation

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance