A lightweight retrieval benchmarking toolkit for experimenting with retrieval pipelines.

These details have not been verified by PyPI

Project links

Project description

Ragaroo: A Lightweight Retrieval Benchmarking Library for Custom Datasets

Ragaroo Logo

Ragaroo is a lightweight Python library for benchmarking retrieval pipelines on custom datasets. It is designed for practical RAG/retrieval experiments: load a dataset, define pipelines, run an experiment, compare quality and latency, and save reproducible artifacts.

Features

Strict dataset loading from corpus.jsonl, queries.jsonl, and qrels.tsv
BM25 retrieval with bm25s
Dense retrieval with Sentence Transformers and FAISS
Sparse retrieval with Sentence Transformers sparse encoders
Hybrid retrieval with reciprocal-rank fusion or average-score fusion
Cross-encoder reranking and retriever-as-reranker workflows
LLM-based query augmentation: HyDE, spelling correction, intent clarification
Ranking metrics, latency metrics, CSV/JSON exports, plots, and manifests
Local index caching for repeatable experiments

Installation

For most users, installation should be simple:

pip install ragaroo

OR,

If you are working directly from the GitHub repository:

git clone https://github.com/marcharaoui/ragaroo.git
cd ragaroo

For development:

uv sync --extra dev

From a local checkout:

pip install -e .

Ragaroo requires Python 3.10+.

Dataset Format

Each dataset folder must contain:

my_dataset/
  corpus.jsonl
  queries.jsonl
  qrels.tsv

corpus.jsonl:

{"id":"d1","text":"Paris is the capital of France.","metadata":{"source":"wiki"}}
{"id":"d2","text":"Berlin is the capital of Germany."}

queries.jsonl:

{"id":"q1","text":"capital of France"}
{"id":"q2","text":"capital of Germany"}

qrels.tsv:

query-id	corpus-id	score
q1	d1	1
q2	d2	1

id and _id are both accepted. Ragaroo validates missing files, malformed JSON, duplicate ids, empty text, and qrels that reference unknown queries or documents.

Quickstart

import ragaroo as roo

dataset = roo.Dataset.from_folder("data/nfcorpus")
embedder = roo.SentenceTransformerEmbedder("intfloat/e5-small-v2")

pipelines = [
    roo.Pipeline(
        name="bm25",
        retriever=roo.BM25Retriever(top_k=10),
    ),
    roo.Pipeline(
        name="dense_hnsw",
        retriever=roo.DenseRetriever(
            embedder=embedder,
            top_k=10,
            index_technique="hnsw",
            distance_metric="cosine",
        ),
    ),
    roo.Pipeline(
        name="hybrid_rrf",
        retriever=roo.HybridRetriever(
            retriever_1=roo.DenseRetriever(embedder=embedder, top_k=10),
            retriever_2=roo.BM25Retriever(top_k=10),
            top_k=10,
        ),
    ),
]

report = roo.Experiment(
    dataset=dataset,
    pipelines=pipelines,
    show_progress=True,
).run()

report.summary(sort_by="mrr@10")

Common Patterns

Reranking with a cross-encoder:

roo.Pipeline(
    name="dense_rerank",
    retriever=roo.DenseRetriever(embedder=embedder, top_k=50),
    reranker=roo.CrossEncoderReranker(
        model_name="cross-encoder/ms-marco-MiniLM-L-6-v2",
        top_k=10,
    ),
)

Using a retriever as a reranker:

roo.Pipeline(
    name="bm25_then_dense",
    retriever=roo.BM25Retriever(top_k=50),
    reranker=roo.DenseRetriever(embedder=embedder, top_k=10),
)

HyDE query augmentation:

import os

provider = roo.OpenRouterProvider(
    api_key=os.environ["OPENROUTER_API_KEY"],
    model=os.environ.get("OPENROUTER_MODEL", "openai/gpt-4o-mini"),
)

roo.Pipeline(
    name="dense_hyde",
    retriever=roo.DenseRetriever(embedder=embedder, top_k=10),
    query_augmentation=[
        roo.HyDE(
            user_prompt="Write a concise support-style passage for this query.",
            system_prompt="Return only the passage.",
            temperature=0.3,
            max_tokens=220,
        )
    ],
    llm_provider=provider,
)

If user_prompt, system_prompt, or temperature is omitted, the transform uses its default. Custom user_prompt values are prepended to the dataset query.

By default, experiments evaluate all queries. While prototyping, pass query_limit=50 or any other integer directly to Experiment.

Metrics

Supported quality metrics:

recall@k
precision@k
mrr@k
map@k
hit_rate@k
ndcg@k

Supported latency metrics:

latency_ms
query_augmentation_latency_ms
retrieval_latency_ms
rerank_latency_ms
p50_latency_ms
p95_latency_ms
total_time_s

recall, precision, mrr, map, and hit_rate treat qrel scores greater than zero as relevant. ndcg uses graded qrel scores.

Experiment Outputs

Each experiment saves:

report.json
report.csv
manifest.json
config.json
plots under plots/

With store_query_results=True, Ragaroo also saves per-query metrics and retrieved ids.

The manifest records dataset hashes, pipeline configs, pipeline hashes, dependency versions, platform metadata, git metadata when available, notes, tags, and random seed.

Examples

See examples/README.md.

Included examples:

examples/compare_retrievers.py
examples/compare_topk.py
examples/compare_models.py
examples/compare_rerank.py
examples/compare_hyde.py
examples/compare_multiple_datasets.py

Most examples default to data/nfcorpus. Change the constants at the top of each script to use another dataset, model, cache folder, or query limit.

Model Access

Use HF_TOKEN for gated Hugging Face models or higher Hub rate limits. Use OPENROUTER_API_KEY only for LLM-based query augmentation. Dataset paths, model choices, model cache folders, and query limits are regular Python arguments in scripts and examples, not environment variables.

To keep model downloads in a project-local folder:

import ragaroo as roo

roo.store_models("./models")

Limitations

Ragaroo evaluates retrieval, not answer generation.
It assumes the benchmark target is retrieving the right passages for each query.
It currently loads dataset files into memory.
Generated query augmentation can affect reproducibility unless the provider/model is controlled.

Development Note

Ragaroo was built with human engineering work and assistance from LLM technologies. The code is tested, but LLM-assisted projects can still contain mistakes. Please validate benchmark setup, metrics, and outputs before relying on them for important decisions.

Author

Ragaroo is created and maintained by Marc Haraoui.

Citation

@software{haraoui_ragaroo,
  author = {Marc Haraoui},
  title = {Ragaroo: A Lightweight Retrieval Benchmarking Library for Custom Datasets},
  year = {2026},
  url = {https://github.com/marcharaoui/ragaroo}
}

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.1

Apr 27, 2026

This version

0.1.0

Apr 27, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ragaroo-0.1.0.tar.gz (41.3 kB view details)

Uploaded Apr 27, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ragaroo-0.1.0-py3-none-any.whl (50.9 kB view details)

Uploaded Apr 27, 2026 Python 3

File details

Details for the file ragaroo-0.1.0.tar.gz.

File metadata

Download URL: ragaroo-0.1.0.tar.gz
Upload date: Apr 27, 2026
Size: 41.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.5

File hashes

Hashes for ragaroo-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`934ee83db0159978d37f34351eb857eca5e51337e18b42d40a923c3a7eab5d7f`
MD5	`f5c66d00fe8b05a0751fa54ef835eb1f`
BLAKE2b-256	`30be1ad670befcee096b8ea9ecbc9c1fb9a1b71d09c09138673b30f406ab26d2`

See more details on using hashes here.

File details

Details for the file ragaroo-0.1.0-py3-none-any.whl.

File metadata

Download URL: ragaroo-0.1.0-py3-none-any.whl
Upload date: Apr 27, 2026
Size: 50.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.5

File hashes

Hashes for ragaroo-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ee3b0fe08563c3927f3bb85779325d83c285ee585e2b580c9fa087b33cc172e1`
MD5	`a2562197a891aa2b35fdbe706a200c3a`
BLAKE2b-256	`0999bb40cee3ed64f385a373b5dc282aea9f9abb17668df22c04ea7edbbddea4`

See more details on using hashes here.

ragaroo 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Ragaroo: A Lightweight Retrieval Benchmarking Library for Custom Datasets

Features

Installation

Dataset Format

Quickstart

Common Patterns

Metrics

Experiment Outputs

Examples

Model Access

Limitations

Development Note

Author

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes