A chunk-source-agnostic evaluation harness for RAG chunking strategies
Project description
Every RAG tutorial picks a chunk size, shrugs, and moves on. chunkbench is what happens after you stop shrugging.
The problem, in one sentence
You split your documents into chunks somehow — fixed size, paragraphs, a chunking library's default recipe, vibes — and that one decision quietly determines whether your retriever can ever find the right answer. Most teams never measure it. They just ship the first thing that seemed to work on three test queries and hope.
chunkbench replaces the hoping with a number. Feed it a corpus, a handful of chunking strategies, and a set of real questions with known-correct answers, and it tells you — with recall, precision, and cost figures side by side — which strategy actually retrieves the right information, instead of which one merely feels right.
It does not chunk your documents. It does not pick your embedding model. It does not talk you into using its favorite LLM. Those are your calls, made with your tools — chunkbench just tells you, honestly, whether the call you made was any good. Think of it less as a library and more as the friend who actually reads the whole receipt before saying "yeah, that seems fair."
Full design rationale — why golden questions live at the section level, what each metric actually measures, and the exact list of things chunkbench deliberately refuses to do — lives in docs/chunkbench.md.
Install
pip install chunkbench-rag
(The PyPI distribution is chunkbench-rag — chunkbench alone was too close to an existing project's name — but the import and the CLI command are both still plain chunkbench.)
Core install is three dependencies deep (pydantic, pyyaml, numpy) — no embedding SDK, no LLM SDK, no chunking library, because chunkbench isn't going to make that decision for you. The one shipped convenience extra:
pip install chunkbench-rag[openai] # adds chunkbench.embedding.providers.openai
# and chunkbench.generation.providers.openai
Using something else — chonkie, Gemini, Cohere, a model you trained in your garage — see Bring your own everything below. No extra required; it's a ~15-line function either way.
60-second quickstart
from chunkbench import run_comparison
from chunkbench.corpus import directory_corpus_loader
report = run_comparison(
corpus=directory_corpus_loader("examples/quickstart/corpus", extensions=(".md",)),
embedder=toy_embedder, # any Embedder — see below
golden_set="examples/quickstart/golden_qa.yaml",
chunk_sources={
"whole_section": whole_section_chunker,
"paragraph": paragraph_chunker,
},
k=2,
)
report.to_markdown("report.md")
report.to_json("report.json")
toy_embedder, whole_section_chunker, and paragraph_chunker are tiny example functions in examples/quickstart/quickstart.py — this exact snippet runs today, unmodified, no API key, no network call:
python examples/quickstart/quickstart.py
whole_section: recall@2=1.00
paragraph: recall@2=1.00
Wrote examples/quickstart/report.md and examples/quickstart/report.json
The embedder there is a dependency-free hashing stand-in, good for proving the plumbing works and not much else. Swap it for something real before trusting the numbers.
Bring your own everything
There is exactly one base class in chunkbench you're required to inherit from: none. ChunkSource, Embedder, Generator, and Judge are all plain function shapes (Callable[...]) — wrap whatever you already use and hand it over.
Chunking, with chonkie:
from chonkie import RecursiveChunker
from chunkbench import Chunk, Document
def chonkie_chunker(document: Document) -> list[Chunk]:
chunker = RecursiveChunker()
chunks = []
for slug, section_text in _sections(document.content): # your own section splitter
for i, piece in enumerate(chunker(section_text)):
chunks.append(Chunk(
id=f"{document.id}-{slug}-{i}", doc_id=document.id,
section=slug, text=piece.text,
))
return chunks
Embedding and generation, with Gemini 2.5 Flash:
from google import genai
from chunkbench import Embedder, Vector
def gemini_embedder(model: str = "gemini-embedding-001") -> Embedder:
client = genai.Client()
def embed(texts: list[str]) -> list[Vector]:
return [e.values for e in client.models.embed_content(model=model, contents=texts).embeddings]
return embed
from chunkbench import run_comparison
report = run_comparison(
corpus=my_corpus_loader,
embedder=gemini_embedder(),
chunk_sources={"chonkie_recursive": chonkie_chunker},
golden_set="golden_qa.yaml",
k=5,
)
Neither chonkie nor google-genai is a chunkbench dependency — install what you need yourself. Full runnable versions, plus the same pattern applied to a judge model, live in docs/providers.md and examples/providers/. Swap in Cohere, Voyage, sentence-transformers, or an in-house model gateway the same way — chunkbench genuinely does not care.
The composable API
For finer control — running only part of the pipeline, or scoring a custom metric:
from chunkbench import Pipeline, registry
@registry.metric("my_custom_metric")
class MyMetric:
def score(self, retrieved, golden) -> float:
...
pipeline = Pipeline(embedder=my_embed_function, golden_set=my_golden_set)
chunks = pipeline.run_chunking(corpus, chunk_source=my_semantic_chunker)
results = pipeline.run_retrieval(chunks, k=5)
scores = pipeline.score(results, metrics=["recall", "precision", "my_custom_metric"])
docs/api-stability.md names exactly which extension points (chunk-source contract, metric registry, embedder/vector-store interfaces) carry a semver stability guarantee — the short version: the things listed above, forever; the internals, whenever we find a better way.
CLI
# Config-file-driven — see docs/chunkbench.md for the full schema.
chunkbench run --config chunkbench.yaml
# Flag-driven, for one-off use. --chunkers/--embedder/--generator/--judge
# all take 'module:attribute' import strings — chunkbench doesn't ship
# chunking algorithms or provider integrations, so these point at your
# own code, importable from wherever you run the command.
chunkbench run \
--corpus ./docs \
--golden golden_qa.yaml \
--chunkers whole_section=mypkg.chunkers:whole_section,semantic=mypkg.chunkers:semantic \
--embedder mypkg.providers:gemini_embedder \
--k 5
# Re-render a previous run's results.json in another format.
chunkbench report --from results.json --format html
A regression_gate section in the config file makes chunkbench run exit non-zero when a metric drops below a threshold ("fail if recall_at_k for semantic drops below 0.8") — drop it into CI as a quality gate on chunking changes instead of finding out in production.
What you get back
A Report, in three flavors: a Python object (iterable/indexable per approach and per question), Markdown (drop into a PR description), and JSON (the stable integration point — schema pinned in docs/report-schema.json).
Documentation
docs/chunkbench.md— the full design doc: core idea, what's measured and why, pipeline stages, what chunkbench deliberately doesn't do.docs/providers.md— wiring in chonkie, Gemini, or any other chunking/embedding/LLM provider.docs/api-stability.md— which extension points carry a semver guarantee.docs/report-schema.json— JSON Schema forresults.json.CONTRIBUTING.md— dev setup, checks, and code style.CHANGELOG.md— release history.
License
MIT — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file chunkbench_rag-0.1.0.tar.gz.
File metadata
- Download URL: chunkbench_rag-0.1.0.tar.gz
- Upload date:
- Size: 147.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
24c5d90cd91fe9176d0a005f1fc4d148132d0565aa4cb1b92abdecc634f2c3d6
|
|
| MD5 |
8423663954d95df9f7e75711bec8ec68
|
|
| BLAKE2b-256 |
61da8f774720c57da4be5308cd18198f221c5125cafc1c9ccd38be19308765c0
|
Provenance
The following attestation bundles were made for chunkbench_rag-0.1.0.tar.gz:
Publisher:
release.yml on ghassenov/chunkbench
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
chunkbench_rag-0.1.0.tar.gz -
Subject digest:
24c5d90cd91fe9176d0a005f1fc4d148132d0565aa4cb1b92abdecc634f2c3d6 - Sigstore transparency entry: 2060616455
- Sigstore integration time:
-
Permalink:
ghassenov/chunkbench@8c75e1d7587b41c82b094e85add43ecc8626e810 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/ghassenov
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@8c75e1d7587b41c82b094e85add43ecc8626e810 -
Trigger Event:
push
-
Statement type:
File details
Details for the file chunkbench_rag-0.1.0-py3-none-any.whl.
File metadata
- Download URL: chunkbench_rag-0.1.0-py3-none-any.whl
- Upload date:
- Size: 49.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
25826f13f51d99bafb38089bcef87a508f3685b0169a29b91839045687553bbb
|
|
| MD5 |
049c4aa40deb8b7c325bd031f8411525
|
|
| BLAKE2b-256 |
76102514f8d2abb9bece65eaa27a1ecb054552af273e8116a22f7e582ec64687
|
Provenance
The following attestation bundles were made for chunkbench_rag-0.1.0-py3-none-any.whl:
Publisher:
release.yml on ghassenov/chunkbench
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
chunkbench_rag-0.1.0-py3-none-any.whl -
Subject digest:
25826f13f51d99bafb38089bcef87a508f3685b0169a29b91839045687553bbb - Sigstore transparency entry: 2060616855
- Sigstore integration time:
-
Permalink:
ghassenov/chunkbench@8c75e1d7587b41c82b094e85add43ecc8626e810 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/ghassenov
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@8c75e1d7587b41c82b094e85add43ecc8626e810 -
Trigger Event:
push
-
Statement type: