Bayesian probability transforms for BM25 retrieval scores

These details have not been verified by PyPI

Project links

Repository

Project description

Bayesian BM25

A probabilistic framework that converts raw BM25 retrieval scores into calibrated relevance probabilities using Bayesian inference.

Overview

Standard BM25 produces unbounded scores that lack consistent meaning across queries, making threshold-based filtering and multi-signal fusion unreliable. Bayesian BM25 addresses this by applying a sigmoid likelihood model with a composite prior (term frequency + document length normalization) and computing Bayesian posteriors that output well-calibrated probabilities in [0, 1]. A corpus-level base rate prior further improves calibration by 68--77% without requiring relevance labels.

Key capabilities:

Score-to-probability transform -- convert raw BM25 scores into calibrated relevance probabilities via sigmoid likelihood + composite prior + Bayesian posterior
Base rate calibration -- corpus-level base rate prior estimated from score distribution decomposes the posterior into three additive log-odds terms, reducing expected calibration error by 68--77% without relevance labels
Parameter learning -- batch gradient descent or online SGD with EMA-smoothed gradients and Polyak averaging
Probabilistic fusion -- combine multiple probability signals using log-odds conjunction, which resolves the shrinkage problem of naive probabilistic AND
Search integration -- drop-in scorer wrapping bm25s that returns probabilities instead of raw scores

Installation

pip install bayesian-bm25

To use the integrated search scorer (requires bm25s):

pip install bayesian-bm25[scorer]

Quick Start

Converting BM25 Scores to Probabilities

import numpy as np
from bayesian_bm25 import BayesianProbabilityTransform

transform = BayesianProbabilityTransform(alpha=1.5, beta=1.0, base_rate=0.01)

scores = np.array([0.5, 1.0, 1.5, 2.0, 3.0])
tfs = np.array([1, 2, 3, 5, 8])
doc_len_ratios = np.array([0.3, 0.5, 0.8, 1.0, 1.5])

probabilities = transform.score_to_probability(scores, tfs, doc_len_ratios)

End-to-End Search with Probabilities

from bayesian_bm25 import BayesianBM25Scorer

corpus_tokens = [
    ["python", "machine", "learning"],
    ["deep", "learning", "neural", "networks"],
    ["data", "visualization", "tools"],
]

scorer = BayesianBM25Scorer(k1=1.2, b=0.75, method="lucene", base_rate="auto")
scorer.index(corpus_tokens, show_progress=False)

doc_ids, probabilities = scorer.retrieve([["machine", "learning"]], k=3)

Combining Multiple Signals

import numpy as np
from bayesian_bm25 import log_odds_conjunction, prob_and, prob_or

signals = np.array([0.85, 0.70, 0.60])

prob_and(signals)                # 0.357 (shrinkage problem)
log_odds_conjunction(signals)    # 0.773 (agreement-aware)

Online Learning from User Feedback

from bayesian_bm25 import BayesianProbabilityTransform

transform = BayesianProbabilityTransform(alpha=1.0, beta=0.0)

# Batch warmup on historical data
transform.fit(historical_scores, historical_labels)

# Online refinement from live feedback
for score, label in feedback_stream:
    transform.update(score, label, learning_rate=0.01, momentum=0.95)

# Use Polyak-averaged parameters for stable inference
alpha = transform.averaged_alpha
beta = transform.averaged_beta

Citation

If you use this work, please cite the following papers:

@preprint{Jeong2026BayesianBM25,
  author    = {Jeong, Jaepil},
  title     = {Bayesian {BM25}: {A} Probabilistic Framework for Hybrid Text
               and Vector Search},
  year      = {2026},
  publisher = {Zenodo},
  doi       = {10.5281/zenodo.18414940},
  url       = {https://doi.org/10.5281/zenodo.18414940}
}

@preprint{Jeong2026BayesianNeural,
  author    = {Jeong, Jaepil},
  title     = {From {Bayesian} Inference to Neural Computation: The Analytical
               Emergence of Neural Network Structure from Probabilistic
               Relevance Estimation},
  year      = {2026},
  publisher = {Zenodo},
  doi       = {10.5281/zenodo.18512411},
  url       = {https://doi.org/10.5281/zenodo.18512411}
}

License

This project is licensed under the Apache License 2.0.

Project details

These details have not been verified by PyPI

Project links

Repository

Release history Release notifications | RSS feed

0.11.0

Mar 17, 2026

0.9.0

Mar 14, 2026

0.8.1

Mar 14, 2026

0.8.0

Mar 5, 2026

0.7.0

Mar 3, 2026

0.6.0

Feb 28, 2026

0.5.0

Feb 26, 2026

0.4.1

Feb 25, 2026

0.4.0

Feb 24, 2026

0.3.2

Feb 22, 2026

0.3.1

Feb 21, 2026

0.3.0

Feb 20, 2026

This version

0.2.0

Feb 18, 2026

0.1.1

Feb 16, 2026

0.1.0

Feb 14, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bayesian_bm25-0.2.0.tar.gz (25.8 kB view details)

Uploaded Feb 18, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

bayesian_bm25-0.2.0-py3-none-any.whl (15.8 kB view details)

Uploaded Feb 18, 2026 Python 3

File details

Details for the file bayesian_bm25-0.2.0.tar.gz.

File metadata

Download URL: bayesian_bm25-0.2.0.tar.gz
Upload date: Feb 18, 2026
Size: 25.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for bayesian_bm25-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`0dd2adb9a9b132e88a849ac6ecb1209d730d84e65a843d353e6a894689828aec`
MD5	`a09e6274b2c3abce88f0a4a0b3667d7f`
BLAKE2b-256	`4a16dff84f31473047e37ce9d03ddbda68e2ca796cb23fbe8178e249689eb46d`

See more details on using hashes here.

File details

Details for the file bayesian_bm25-0.2.0-py3-none-any.whl.

File metadata

Download URL: bayesian_bm25-0.2.0-py3-none-any.whl
Upload date: Feb 18, 2026
Size: 15.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for bayesian_bm25-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6f823c3055bb57e0a492fbc88f325635b3bb19d3cae3d73f8076dc0406a004af`
MD5	`ee2c0314d0feed4cde8097e27062e9ec`
BLAKE2b-256	`e3fd70ddb7a4a7d56d8f6d1ebf67991a53b7bc5e3786e747b79435d21ec99fa3`

See more details on using hashes here.

bayesian-bm25 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Bayesian BM25

Overview

Installation

Quick Start

Converting BM25 Scores to Probabilities

End-to-End Search with Probabilities

Combining Multiple Signals

Online Learning from User Feedback

Citation

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes