Skip to main content

Bayesian BM25 scoring and experimental validation (Rust core + Python bindings)

Project description

bb25 (Bayesian BM25)

bb25 is a fast, self-contained BM25 + Bayesian calibration implementation with a minimal Python API. It also includes a small reference corpus and experiment suite so you can validate the expected numerical properties.

  • PyPI package name: bayesian_bm25_rs
  • Python import name: bb25

Install

pip install bayesian_bm25_rs

Quick start

Use the built-in corpus and queries

import bb25 as bb

corpus = bb.build_default_corpus()
docs = corpus.documents()
queries = bb.build_default_queries()

bm25 = bb.BM25Scorer(corpus, 1.2, 0.75)
score = bm25.score(queries[0].terms, docs[0])
print("score0", score)

Build your own corpus

import bb25 as bb

corpus = bb.Corpus()
corpus.add_document("d1", "neural networks for ranking", [0.1] * 8)
corpus.add_document("d2", "bm25 is a strong baseline", [0.2] * 8)
corpus.build_index()  # must be called before creating scorers

bm25 = bb.BM25Scorer(corpus, 1.2, 0.75)
print(bm25.idf("bm25"))

Bayesian calibration + hybrid fusion

import bb25 as bb

corpus = bb.build_default_corpus()
docs = corpus.documents()
queries = bb.build_default_queries()

bm25 = bb.BM25Scorer(corpus, 1.2, 0.75)
bayes = bb.BayesianBM25Scorer(bm25, 1.0, 0.5)
vector = bb.VectorScorer()
hybrid = bb.HybridScorer(bayes, vector)

q = queries[0]
prob_or = hybrid.score_or(q.terms, q.embedding, docs[0])
prob_and = hybrid.score_and(q.terms, q.embedding, docs[0])
print("OR", prob_or, "AND", prob_and)

Run the experiments

import bb25 as bb

results = bb.run_experiments()
print(all(r.passed for r in results))

Build from source (Rust)

make build

PyPI publishing

Build a wheel with maturin:

python -m pip install maturin
maturin build --release

For Pyodide builds, see docs/pyodide.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bayesian_bm25_rs-0.1.1.tar.gz (17.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bayesian_bm25_rs-0.1.1-cp312-cp312-macosx_11_0_arm64.whl (390.7 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

File details

Details for the file bayesian_bm25_rs-0.1.1.tar.gz.

File metadata

  • Download URL: bayesian_bm25_rs-0.1.1.tar.gz
  • Upload date:
  • Size: 17.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.14

File hashes

Hashes for bayesian_bm25_rs-0.1.1.tar.gz
Algorithm Hash digest
SHA256 fa8f583458db3433968ca805c35062c643f1998a7f457408fb4b433b12959377
MD5 7073f504f9b5135ff17e8c054e3e6544
BLAKE2b-256 f23133a503223c4c723f3f6234eddd2520b8a80f189b7ad9176ff0b77929dafb

See more details on using hashes here.

File details

Details for the file bayesian_bm25_rs-0.1.1-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for bayesian_bm25_rs-0.1.1-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 d4cc3ae490e496106759063f4851f7b950f1b75927e255426ec3c6daa950fdff
MD5 91dc75386fbf71e2918fe58b0fefe8d3
BLAKE2b-256 094d3c8aa6ee85de62cbe9c18cf671d0ea1e7e7eae9feee07ff9097c93dac703

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page