Skip to main content

Bayesian BM25 scoring and experimental validation (Rust core + Python bindings)

Project description

bb25 (Bayesian BM25)

bb25 is a fast, self-contained BM25 + Bayesian calibration implementation with a minimal Python API. It also includes a small reference corpus and experiment suite so you can validate the expected numerical properties.

Original author's implementation: The paper author (Jaepil Jeong, Cognica) maintains the reference Python implementation at cognica-io/bayesian-bm25. That library focuses on production-ready score-to-probability conversion with BM25 ranking order preservation, auto parameter estimation, online learning, and log-odds conjunction for hybrid fusion. If you need a drop-in probability transform for an existing search system, use the original. bb25 is a Rust-core experimental validation that prioritizes performance and end-to-end reproducibility of the paper's claims.

Install

pip install bb25

Quick start

Use the built-in corpus and queries

import bb25 as bb

corpus = bb.build_default_corpus()
docs = corpus.documents()
queries = bb.build_default_queries()

bm25 = bb.BM25Scorer(corpus, 1.2, 0.75)
score = bm25.score(queries[0].terms, docs[0])
print("score0", score)

Build your own corpus

import bb25 as bb

corpus = bb.Corpus()
corpus.add_document("d1", "neural networks for ranking", [0.1] * 8)
corpus.add_document("d2", "bm25 is a strong baseline", [0.2] * 8)
corpus.build_index()  # must be called before creating scorers

bm25 = bb.BM25Scorer(corpus, 1.2, 0.75)
print(bm25.idf("bm25"))

Bayesian calibration + hybrid fusion

import bb25 as bb

corpus = bb.build_default_corpus()
docs = corpus.documents()
queries = bb.build_default_queries()

bm25 = bb.BM25Scorer(corpus, 1.2, 0.75)
bayes = bb.BayesianBM25Scorer(bm25, 1.0, 0.5)
vector = bb.VectorScorer()
hybrid = bb.HybridScorer(bayes, vector)

q = queries[0]
prob_or = hybrid.score_or(q.terms, q.embedding, docs[0])
prob_and = hybrid.score_and(q.terms, q.embedding, docs[0])
print("OR", prob_or, "AND", prob_and)

Run the experiments

import bb25 as bb

results = bb.run_experiments()
print(all(r.passed for r in results))

Sample script

See docs/sample_usage.py for an end-to-end example using BM25, Bayesian calibration, and hybrid fusion.

Benchmarks (BM25 vs Bayesian)

See benchmarks/README.md for a lightweight runner that compares BM25 and Bayesian BM25 on your own corpora.

English Benchmark (SQuAD, 100 validation queries)

This is where BB25 shines: Bayesian Hybrid beats the classic BM25 Hybrid.

Method NDCG@10 MRR@10 Notes
WS (BB25+Dense) 0.9149 0.8850 SOTA!
WS (BM25+Dense) 0.9051 0.8717
RRF (BM25+Dense) 0.8874 0.8483 RRF underperforms weighted sum

Conclusion

"Bayesian BM25 (bb25) has demonstrated the potential to outperform classic BM25 in hybrid search."

On the English dataset (SQuAD), combining bb25 with Dense (BGE-M3) achieves higher performance than the BM25 + Dense baseline (+1.0%p NDCG). This suggests the probabilistic score from bb25 blends more smoothly with vector scores (less scale mismatch than a simple weighted sum).

Original paper and implementations:

Build from source (Rust)

make build

PyPI publishing

Build a wheel with maturin:

python -m pip install maturin
maturin build --release

For Pyodide builds, see docs/pyodide.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bb25-0.3.0.tar.gz (54.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bb25-0.3.0-cp313-cp313-macosx_11_0_arm64.whl (444.2 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

File details

Details for the file bb25-0.3.0.tar.gz.

File metadata

  • Download URL: bb25-0.3.0.tar.gz
  • Upload date:
  • Size: 54.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.12.6

File hashes

Hashes for bb25-0.3.0.tar.gz
Algorithm Hash digest
SHA256 494f99f5f530d7f4e86569bb879877e04fe3e9bc2fb9ad6ee1ecfe894bc3c797
MD5 c5eb57424a0fee7053fb178b68be4f17
BLAKE2b-256 b01a90ba0c8b2ae9b9f3f649e51290921b5dd3ee4d5ad38d2c3d0ffeaa4262fb

See more details on using hashes here.

File details

Details for the file bb25-0.3.0-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for bb25-0.3.0-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 a4c717b82172e7c48b770f0369f1af47228c491cbdd72091a854f3fd1fb528f2
MD5 eb68fa3ffa770cffe8c3046d226562e4
BLAKE2b-256 ff92586c8ce1b32d69e5049e19065e62d10bae1d7688570e7b10d8d0d28f3ef7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page