A benchmarking toolkit that evaluates and compares RAG chunking strategies against your actual queries to find the optimal configuration

These details have not been verified by PyPI

Project links

Project description

Beacon

A benchmarking toolkit that evaluates and compares RAG chunking strategies against your actual queries to find the optimal configuration.

Why Beacon?

RAG quality is 80% retrieval quality, and retrieval quality depends heavily on chunking. Yet most teams:

Use default chunk sizes (512 tokens) without evaluation
Don't know if semantic chunking would improve their specific use case
Spend weeks manually testing different configurations
Have no metrics to compare strategies objectively

Beacon automates this with reproducible benchmarks.

Features

Multi-Strategy Evaluation - Compare 5+ chunking strategies in one run
Standard IR Metrics - MRR, Recall@K, NDCG, Precision, MAP
Bayesian Auto-Tuning - Find optimal chunk size and overlap automatically
HTML Reports - Beautiful, shareable benchmark reports
Local-First - Run entirely offline with local embeddings
Framework Agnostic - Works with any RAG implementation

Installation

pip install beacon-eval

For auto-tuning support:

pip install beacon-eval[optuna]

Quick Start

1. Create a configuration file

beacon init

This creates a beacon.yaml configuration file.

2. Prepare your queries

Create a queries.jsonl file with your test queries:

{"query": "What is the refund policy?", "relevant_doc_ids": ["policy.pdf"]}
{"query": "How do I reset my password?", "relevant_doc_ids": ["faq.pdf"]}

3. Run the benchmark

beacon run beacon.yaml

Configuration

name: my_benchmark
documents:
  - ./docs/*.pdf
  - ./docs/*.txt
queries: ./queries.jsonl
embedding_model: all-MiniLM-L6-v2
top_k: 10

strategies:
  - name: fixed_256
    type: fixed_size
    chunk_size: 256
    overlap: 25

  - name: fixed_512
    type: fixed_size
    chunk_size: 512
    overlap: 50

  - name: sentence_based
    type: sentence
    chunk_size: 512

  - name: recursive
    type: recursive
    chunk_size: 512
    overlap: 50

output_dir: ./results
generate_html_report: true
export_csv: true

Chunking Strategies

Strategy	Description
`fixed_size`	Split by fixed token/character count
`sentence`	Split by sentence boundaries
`paragraph`	Split by paragraph boundaries
`semantic`	Split by semantic similarity
`recursive`	Recursively split with multiple separators

Python API

from beacon import BenchmarkConfig, ChunkingStrategy
from beacon.runner import run_benchmark
from beacon.parsers import load_documents, load_queries

# Load your data
documents = load_documents([Path("./docs")])
queries = load_queries(Path("./queries.jsonl"))

# Define strategies
strategies = [
    ChunkingStrategy(name="small", strategy_type="fixed_size", chunk_size=256),
    ChunkingStrategy(name="medium", strategy_type="fixed_size", chunk_size=512),
    ChunkingStrategy(name="large", strategy_type="fixed_size", chunk_size=1024),
]

# Run benchmark
config = BenchmarkConfig(
    name="my_benchmark",
    documents=[Path("./docs")],
    queries_file=Path("./queries.jsonl"),
    strategies=strategies,
)

result = run_benchmark(config)
print(f"Best strategy: {result.best_strategy}")

Auto-Tuning

Find optimal chunking parameters automatically:

from beacon.tuner import auto_tune
from beacon.parsers import load_documents, load_queries
from beacon.models import ChunkingStrategyType

documents = load_documents([Path("./docs")])
queries = load_queries(Path("./queries.jsonl"))

result = auto_tune(
    documents=documents,
    queries=queries,
    strategy_type=ChunkingStrategyType.FIXED_SIZE,
    metric="mrr",
    n_trials=50,
    chunk_size_range=(100, 2000),
)

print(f"Optimal chunk size: {result['chunk_size']}")
print(f"Optimal overlap: {result['chunk_overlap']}")
print(f"Best MRR: {result['best_mrr']:.4f}")

Metrics

Metric	Description	Good Score
MRR	Mean Reciprocal Rank	> 0.7
Recall@K	% of relevant docs in top K	> 0.8
NDCG@K	Normalized DCG	> 0.75
Precision@K	Precision at K	> 0.6
MAP	Mean Average Precision	> 0.6

CLI Commands

# Initialize sample configuration
beacon init

# Run benchmark
beacon run config.yaml

# List available strategies
beacon strategies

# Compare multiple benchmark results
beacon compare results1.json results2.json

Output

Beacon generates:

results.json - Full results in JSON format
results.csv - Comparison table in CSV
report.html - Interactive HTML report with charts

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

Security

If you discover a security vulnerability, please see SECURITY.md for reporting guidelines.

License

MIT License - see LICENSE for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Dec 30, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

beacon_eval-0.1.0.tar.gz (50.9 kB view details)

Uploaded Dec 30, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

beacon_eval-0.1.0-py3-none-any.whl (37.0 kB view details)

Uploaded Dec 30, 2025 Python 3

File details

Details for the file beacon_eval-0.1.0.tar.gz.

File metadata

Download URL: beacon_eval-0.1.0.tar.gz
Upload date: Dec 30, 2025
Size: 50.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for beacon_eval-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`76b567313fe6234100b12fde8af7da9793c27aff6af2b4b4fc7d620b8c755944`
MD5	`5cfe985aba510bf0a3bb6b9cdabba9f0`
BLAKE2b-256	`89adef1cbf887b01096866c7d999797213869ed9afde9914889ce20187d97d72`

See more details on using hashes here.

File details

Details for the file beacon_eval-0.1.0-py3-none-any.whl.

File metadata

Download URL: beacon_eval-0.1.0-py3-none-any.whl
Upload date: Dec 30, 2025
Size: 37.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for beacon_eval-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`98cc06e1780b9d385409946855bc35681e87f3b758e566ed64bff8bbbbf63d3a`
MD5	`d0e0bfd620c31bde36fd37265e1ae68d`
BLAKE2b-256	`715209eefa99e3258aa9a0a0f3734dcb141c3f07a5e6fdc1a7c624aa9ec3f959`

See more details on using hashes here.

beacon-eval 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Beacon

Why Beacon?

Features

Installation

Quick Start

1. Create a configuration file

2. Prepare your queries

3. Run the benchmark

Configuration

Chunking Strategies

Python API

Auto-Tuning

Metrics

CLI Commands

Output

Contributing

Security

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes