Bayesian probability transforms for BM25 retrieval scores

These details have not been verified by PyPI

Project links

Repository

Project description

Bayesian BM25

The reference implementation of the Bayesian BM25 and From Bayesian Inference to Neural Computation papers, by the original author. Converts raw BM25 retrieval scores into calibrated relevance probabilities using Bayesian inference.

Overview

Standard BM25 produces unbounded scores that lack consistent meaning across queries, making threshold-based filtering and multi-signal fusion unreliable. Bayesian BM25 addresses this by applying a sigmoid likelihood model with a composite prior (term frequency + document length normalization) and computing Bayesian posteriors that output well-calibrated probabilities in [0, 1]. A corpus-level base rate prior further improves calibration by 68–77% without requiring relevance labels.

Key capabilities:

Score-to-probability transform — convert raw BM25 scores into calibrated relevance probabilities via sigmoid likelihood + composite prior + Bayesian posterior
Base rate calibration — corpus-level base rate prior estimated from score distribution (95th percentile, mixture model, or elbow detection) decomposes the posterior into three additive log-odds terms, reducing expected calibration error by 68–77% without relevance labels
Parameter learning — batch gradient descent or online SGD with EMA-smoothed gradients and Polyak averaging, with three training modes: balanced (C1), prior-aware (C2), and prior-free (C3)
Probabilistic fusion — combine multiple probability signals using AND, OR, NOT, and log-odds conjunction with multiplicative confidence scaling, optional per-signal reliability weights (Log-OP), and sparse signal gating (ReLU/Swish activations from Paper 2, Theorems 6.5.3/6.7.4)
Learnable fusion weights — LearnableLogOddsWeights learns per-signal reliability from labeled data via a Hebbian gradient that is backprop-free, starting from Naive Bayes uniform initialization (Remark 5.3.2)
Attention-based fusion — AttentionLogOddsWeights learns query-dependent signal weights via attention mechanism (Paper 2, Section 8), replacing static weights with query-adaptive weighting
Hybrid search — cosine_to_probability() converts vector similarity scores to probabilities for fusion with BM25 signals via weighted log-odds conjunction
WAND pruning — wand_upper_bound() computes safe Bayesian probability upper bounds for document pruning in top-k retrieval
Calibration metrics — expected_calibration_error(), brier_score(), reliability_diagram(), and calibration_report() for evaluating probability quality, with CalibrationReport bundling all metrics into a single diagnostic
Fusion debugger — FusionDebugger records every intermediate value through the full pipeline (likelihood, prior, posterior, fusion) for transparent inspection, document comparison, and crossover detection; supports hierarchical fusion tracing with AND/OR/NOT composition
Multi-field search — MultiFieldScorer maintains separate BM25 indexes per field and fuses field-level probabilities via log-odds conjunction with configurable per-field weights
Search integration — drop-in scorer wrapping bm25s that returns probabilities instead of raw scores

Adoption

MTEB — included as a baseline retrieval model (bb25) for the Massive Text Embedding Benchmark
txtai — used for BM25 score normalization in hybrid search (normalize="bayesian-bm25")
UQA — used as the scoring operator for probabilistic text retrieval and multi-signal fusion in the unified query algebra

Installation

pip install bayesian-bm25

To use the integrated search scorer (requires bm25s):

pip install bayesian-bm25[scorer]

Quick Start

Converting BM25 Scores to Probabilities

import numpy as np
from bayesian_bm25 import BayesianProbabilityTransform

transform = BayesianProbabilityTransform(alpha=1.5, beta=1.0, base_rate=0.01)

scores = np.array([0.5, 1.0, 1.5, 2.0, 3.0])
tfs = np.array([1, 2, 3, 5, 8])
doc_len_ratios = np.array([0.3, 0.5, 0.8, 1.0, 1.5])

probabilities = transform.score_to_probability(scores, tfs, doc_len_ratios)

End-to-End Search with Probabilities

from bayesian_bm25 import BayesianBM25Scorer

corpus_tokens = [
    ["python", "machine", "learning"],
    ["deep", "learning", "neural", "networks"],
    ["data", "visualization", "tools"],
]

scorer = BayesianBM25Scorer(k1=1.2, b=0.75, method="lucene", base_rate="auto")
scorer.index(corpus_tokens, show_progress=False)

doc_ids, probabilities = scorer.retrieve([["machine", "learning"]], k=3)

Multi-Field Search

from bayesian_bm25 import MultiFieldScorer

documents = [
    {"title": ["bayesian", "bm25"], "body": ["probabilistic", "framework", "search"]},
    {"title": ["neural", "networks"], "body": ["deep", "learning", "models"]},
    {"title": ["information", "retrieval"], "body": ["search", "ranking", "relevance"]},
]

scorer = MultiFieldScorer(
    fields=["title", "body"],
    field_weights={"title": 0.4, "body": 0.6},
    k1=1.2, b=0.75, method="lucene",
)
scorer.index(documents, show_progress=False)
doc_ids, probabilities = scorer.retrieve(["bayesian", "search"], k=3)

Combining Multiple Signals

import numpy as np
from bayesian_bm25 import log_odds_conjunction, prob_and, prob_not, prob_or

signals = np.array([0.85, 0.70, 0.60])

prob_and(signals)                # 0.357 (shrinkage problem)
log_odds_conjunction(signals)    # 0.773 (agreement-aware)

# Exclusion query: "python AND NOT java"
p_python, p_java = 0.90, 0.75
prob_and(np.array([p_python, prob_not(p_java)]))  # 0.225

Hybrid Text + Vector Search

import numpy as np
from bayesian_bm25 import cosine_to_probability, log_odds_conjunction

# BM25 probabilities (from Bayesian BM25)
bm25_probs = np.array([0.85, 0.60, 0.40])

# Vector search cosine similarities -> probabilities
cosine_scores = np.array([0.92, 0.35, 0.70])
vector_probs = cosine_to_probability(cosine_scores)  # [0.96, 0.675, 0.85]

# Fuse with reliability weights (BM25 weight=0.6, vector weight=0.4)
stacked = np.stack([bm25_probs, vector_probs], axis=-1)
fused = log_odds_conjunction(stacked, weights=np.array([0.6, 0.4]))

# Fuse with weights and confidence scaling (alpha + weights compose)
fused = log_odds_conjunction(stacked, alpha=0.5, weights=np.array([0.6, 0.4]))

# Gated fusion: ReLU/Swish activation in logit space (Paper 2, Theorems 6.5.3/6.7.4)
fused_relu = log_odds_conjunction(stacked, gating="relu")    # MAP estimation
fused_swish = log_odds_conjunction(stacked, gating="swish")  # Bayes estimation

Learning Fusion Weights from Data

import numpy as np
from bayesian_bm25 import LearnableLogOddsWeights

# 3 retrieval signals: BM25, vector search, metadata match
learner = LearnableLogOddsWeights(n_signals=3, alpha=0.0)
# Initial weights are uniform: [0.333, 0.333, 0.333]

# Batch fit from labeled data (probs: m x 3, labels: m)
learner.fit(training_probs, training_labels, learning_rate=0.1)
# Learned weights reflect signal reliability: [0.70, 0.19, 0.11]

# Online refinement from streaming feedback
for probs, label in feedback_stream:
    learner.update(probs, label, learning_rate=0.05, momentum=0.9)

# Inference with Polyak-averaged weights for stability
fused = learner(test_probs, use_averaged=True)

Attention-Based Fusion

import numpy as np
from bayesian_bm25 import AttentionLogOddsWeights

# 2 retrieval signals, 3 query features, per-signal logit normalization
attn = AttentionLogOddsWeights(
    n_signals=2, n_query_features=3, alpha=0.5, normalize=True,
)

# Train on labeled data with query features
# training_probs: (m, 2), training_labels: (m,), query_features: (m, 3)
attn.fit(training_probs, training_labels, query_features,
         learning_rate=0.01, max_iterations=500)

# Query-dependent fusion: weights adapt per query
fused = attn(test_probs, test_features, use_averaged=True)

WAND Pruning with Bayesian Upper Bounds

from bayesian_bm25 import BayesianProbabilityTransform

transform = BayesianProbabilityTransform(alpha=1.5, beta=2.0, base_rate=0.01)

# Standard BM25 upper bound per query term
bm25_upper_bound = 5.0

# Bayesian upper bound for safe pruning — any document's actual
# probability is guaranteed to be at most this value
bayesian_bound = transform.wand_upper_bound(bm25_upper_bound)

Debugging the Fusion Pipeline

from bayesian_bm25 import BayesianProbabilityTransform
from bayesian_bm25.debug import FusionDebugger

transform = BayesianProbabilityTransform(alpha=0.45, beta=6.10, base_rate=0.02)
debugger = FusionDebugger(transform)

# Trace a single document through the full pipeline
trace = debugger.trace_document(
    bm25_score=8.42, tf=5, doc_len_ratio=0.60,
    cosine_score=0.74, doc_id="doc-42",
)
print(debugger.format_trace(trace))

# Compare two documents to see which signal drove the rank difference
trace_a = debugger.trace_document(bm25_score=8.42, tf=5, doc_len_ratio=0.60, cosine_score=0.74)
trace_b = debugger.trace_document(bm25_score=5.10, tf=2, doc_len_ratio=1.20, cosine_score=0.88)
comparison = debugger.compare(trace_a, trace_b)
print(debugger.format_comparison(comparison))

# Hierarchical fusion: AND(OR(title, body), vector, NOT(spam))
step1 = debugger.trace_fusion([0.85, 0.70], names=["title", "body"], method="prob_or")
step2 = debugger.trace_not(0.90, name="spam")
step3 = debugger.trace_fusion(
    [step1.fused_probability, 0.80, step2.complement],
    names=["OR(title,body)", "vector", "NOT(spam)"],
    method="prob_and",
)

Evaluating Calibration Quality

import numpy as np
from bayesian_bm25 import (
    expected_calibration_error, brier_score, reliability_diagram, calibration_report,
)

probabilities = np.array([0.9, 0.8, 0.3, 0.1, 0.7, 0.2])
labels = np.array([1.0, 1.0, 0.0, 0.0, 1.0, 0.0])

ece = expected_calibration_error(probabilities, labels)   # lower is better
bs = brier_score(probabilities, labels)                   # lower is better
bins = reliability_diagram(probabilities, labels, n_bins=5)  # (avg_pred, avg_actual, count)

# One-call diagnostic report
report = calibration_report(probabilities, labels)
print(report.summary())   # formatted text with ECE, Brier, and reliability table

Online Learning from User Feedback

from bayesian_bm25 import BayesianProbabilityTransform

transform = BayesianProbabilityTransform(alpha=1.0, beta=0.0)

# Batch warmup on historical data
transform.fit(historical_scores, historical_labels)

# Online refinement from live feedback
for score, label in feedback_stream:
    transform.update(score, label, learning_rate=0.01, momentum=0.95)

# Use Polyak-averaged parameters for stable inference
alpha = transform.averaged_alpha
beta = transform.averaged_beta

Training Modes

from bayesian_bm25 import BayesianProbabilityTransform

transform = BayesianProbabilityTransform(alpha=1.0, beta=0.0)

# C1 (balanced, default): train on sigmoid likelihood
transform.fit(scores, labels, mode="balanced")

# C2 (prior-aware): train on full Bayesian posterior
transform.fit(scores, labels, mode="prior_aware", tfs=tfs, doc_len_ratios=ratios)

# C3 (prior-free): train on likelihood, inference uses prior=0.5
transform.fit(scores, labels, mode="prior_free")

Benchmarks

BEIR Hybrid Search

Evaluated on 5 BEIR datasets using the retrieve-then-evaluate protocol (top-1000 per signal, union candidates, pytrec_eval). Dense encoder: all-MiniLM-L6-v2. BM25: k1=1.2, b=0.75, Lucene variant with Snowball English stemmer.

NDCG@10

Method	ArguAna	FiQA	NFCorpus	SciDocs	SciFact	Average
BM25	36.13	25.31	31.82	15.63	68.02	35.38
Dense	36.98	36.87	31.59	21.64	64.51	38.32
Convex	40.01	37.10	35.60	19.67	73.37	41.15
RRF	39.61	36.85	34.43	20.11	71.43	40.49
Bayesian-OR	0.06	25.52	33.46	15.89	66.95	28.38
Bayesian-LogOdds	37.16	32.93	35.31	18.57	72.80	39.35
LO-Local	39.66	37.19	34.10	19.51	73.80	40.85
Bayesian-LO-BR	37.16	32.92	30.99	18.52	72.27	38.37
Bayesian-Balanced	37.27	40.58	35.73	21.42	72.47	41.50
Balanced-Mix	37.29	40.66	35.70	21.53	72.33	41.50
Balanced-Elbow	37.29	40.56	35.76	21.42	72.46	41.50
Gated-ReLU	35.16	27.54	32.45	17.08	69.01	36.25
Gated-Swish	36.20	27.39	28.66	16.82	68.61	35.54
Attention	37.05	38.86	34.39	21.05	70.51	40.37
Attn-NR	37.22	40.53	35.42	21.91	73.24	41.67
Attn-NR-CV	37.23	40.51	35.37	21.97	72.57	41.53
MultiField	7.41	--	31.16	15.68	60.06	28.58*
MF-Balanced	38.40	--	34.51	20.93	66.83	40.17*

MAP@10

Method	ArguAna	FiQA	NFCorpus	SciDocs	SciFact	Average
BM25	23.84	19.10	11.76	9.15	63.38	25.45
Dense	24.46	29.14	11.05	12.94	59.59	27.44
Convex	26.76	29.21	13.46	11.79	69.12	30.07
RRF	26.30	28.85	12.84	11.98	66.58	29.31
Bayesian-OR	0.03	19.09	12.41	9.19	61.70	20.49
Bayesian-LogOdds	24.54	25.58	13.40	11.02	68.31	28.57
LO-Local	26.43	29.32	12.31	11.70	69.29	29.81
Bayesian-LO-BR	24.54	25.58	11.50	10.99	67.83	28.09
Bayesian-Balanced	24.61	32.73	13.80	12.85	68.03	30.40
Balanced-Mix	24.62	32.77	13.79	12.93	67.84	30.39
Balanced-Elbow	24.62	32.72	13.80	12.85	68.02	30.40
Gated-ReLU	22.95	21.00	11.67	10.02	64.10	25.95
Gated-Swish	23.86	20.88	10.23	9.85	63.80	25.73
Attention	24.49	30.96	12.68	12.60	65.92	29.33
Attn-NR	24.57	32.62	13.40	13.22	68.91	30.54
Attn-NR-CV	24.58	32.58	13.39	13.24	68.05	30.37
MultiField	4.76	--	11.45	9.04	55.34	20.15*
MF-Balanced	25.45	--	13.04	12.57	63.21	28.57*

Recall@10

Method	ArguAna	FiQA	NFCorpus	SciDocs	SciFact	Average
BM25	75.04	31.98	14.46	16.34	80.78	43.72
Dense	76.53	44.13	15.50	23.09	78.33	47.52
Convex	81.65	45.04	17.06	20.62	84.89	49.85
RRF	81.65	45.03	16.87	21.15	84.76	49.89
Bayesian-OR	0.14	32.71	15.98	16.76	81.37	29.39
Bayesian-LogOdds	77.03	40.67	17.24	19.40	84.96	47.86
LO-Local	81.37	45.22	16.29	20.42	86.22	49.90
Bayesian-LO-BR	77.03	40.67	15.01	19.32	84.29	47.27
Bayesian-Balanced	77.31	47.61	17.23	22.61	84.83	49.92
Balanced-Mix	77.38	47.61	17.26	22.73	84.83	49.96
Balanced-Elbow	77.38	47.56	17.24	22.63	84.83	49.93
Gated-ReLU	74.04	34.39	16.03	17.79	82.58	44.97
Gated-Swish	75.39	34.21	13.88	17.43	81.91	44.56
Attention	76.74	46.60	17.10	22.23	83.04	49.14
Attn-NR	77.24	47.43	17.05	23.24	84.69	49.93
Attn-NR-CV	77.24	47.50	17.04	23.39	84.71	49.98
MultiField	16.43	--	14.64	16.68	72.87	30.16*
MF-Balanced	79.30	--	16.85	22.03	76.63	48.70*

*MultiField/MF-Balanced average over 4 datasets (FiQA corpus lacks title field).

All methods above are zero-shot (no relevance labels required). With --tune, additional supervised methods are evaluated:

Method	ArguAna	FiQA	NFCorpus	SciDocs	SciFact	NDCG@10 Avg
Balanced-Tuned	37.29	40.49	35.65	22.03	72.70	41.63
Hybrid-AND-Tuned	37.13	28.37	34.44	16.82	69.34	37.22
Bayesian-Tuned	0.79	24.76	32.11	15.68	67.67	28.20

Delta vs BM25 (NDCG@10)

Method	Type	Delta
Attn-NR	zero-shot	+6.28
Balanced-Tuned	trained	+6.26
Attn-NR-CV	zero-shot	+6.14
Balanced-Elbow	zero-shot	+6.12
Balanced-Mix	zero-shot	+6.12
Bayesian-Balanced	zero-shot	+6.11
Convex	zero-shot	+5.76
LO-Local	zero-shot	+5.47
RRF	zero-shot	+5.11
Attention	zero-shot	+4.99
Bayesian-LogOdds	zero-shot	+3.97
Bayesian-LO-BR	zero-shot	+2.99
Dense	zero-shot	+2.94
MF-Balanced	zero-shot	+2.27*
Hybrid-AND-Tuned	trained	+1.84
Gated-ReLU	zero-shot	+0.87
Gated-Swish	zero-shot	+0.16

*MF-Balanced delta computed over 4 datasets (FiQA corpus lacks title field).

Method descriptions:

Method	Description
BM25	Sparse retrieval via bm25s (Lucene variant)
Dense	Cosine similarity via sentence-transformers
Convex	`w * dense_norm + (1-w) * bm25_norm`, w=0.5
RRF	Reciprocal Rank Fusion, `sum(1/(k + rank))`, k=60
Bayesian-OR	Bayesian BM25 probs + cosine probs via `prob_or`
Bayesian-LogOdds	Bayesian BM25 probs to logit, dense calibrated via `logit = alpha * (sim - median)`, combined
LO-Local	Both raw BM25 and dense calibrated symmetrically via `logit = alpha * (score - median)`, combined
Bayesian-LO-BR	Bayesian-LogOdds with base rate prior
Bayesian-Balanced	`balanced_log_odds_fusion`: Bayesian BM25 probs and dense sims to logit space, min-max normalize each, combine with equal weights
Balanced-Mix	Bayesian-Balanced with mixture-model base rate estimation
Balanced-Elbow	Bayesian-Balanced with elbow-detection base rate estimation
Gated-ReLU	`log_odds_conjunction` with ReLU gating in logit space (Paper 2, Theorem 6.5.3)
Gated-Swish	`log_odds_conjunction` with Swish gating in logit space (Paper 2, Theorem 6.7.4)
Attention	Query-dependent signal weighting via `AttentionLogOddsWeights` (Paper 2, Section 8)
Attn-NR	Attention with per-signal logit normalization (`normalize=True`) and 7 features (sparse + dense + cross-signal)
Attn-NR-CV	Attn-NR with 5-fold cross-validation (train/test split per query)
MultiField	`MultiFieldScorer` (title + body) with `log_odds_conjunction`, sparse-only
MF-Balanced	MultiField probs + dense via `balanced_log_odds_fusion`
Balanced-Tuned	Bayesian-Balanced + supervised `BayesianProbabilityTransform.fit()` + grid search over base_rate and fusion_weight
Hybrid-AND-Tuned	`log_odds_conjunction` of Bayesian BM25 and dense probs with tuned alpha
Bayesian-Tuned	Sparse-only Bayesian BM25 with tuned alpha, beta, and base_rate (no dense signal)

Why include underperforming methods? The tables above deliberately include methods that underperform BM25. Each failure mode is informative:

Bayesian-OR (NDCG@10 avg 28.38) — Probabilistic OR assumes signal independence and catastrophically fails on ArguAna (0.06%). This demonstrates why the log-odds conjunction framework (Paper 2, Section 4) is needed: naive probability combination without logit-space calibration collapses when signal distributions differ.
Gated-ReLU / Gated-Swish (36.25 / 35.54) — Sparse gating (Paper 2, Theorems 6.5.3 / 6.7.4) is too aggressive for the BEIR hybrid fusion task. ReLU zeros out negative logits entirely, discarding useful weak signals; Swish softens the gate but still suppresses too much. These gates are designed for high-dimensional signal spaces where most inputs are noise — in a two-signal (sparse + dense) setting, there is no noise to suppress.
MultiField (28.58 over 4 datasets) — Sparse-only multi-field search loses to concatenated BM25 because field separation fragments term statistics (smaller per-field document frequency, shorter effective document lengths). However, MF-Balanced (40.17) recovers most of the gap by fusing with dense embeddings, confirming that field-level BM25 signals are complementary to dense vectors even when they are individually weaker.

Reproduce:

# Zero-shot (18 methods)
python benchmarks/hybrid_beir.py -d <beir-data-dir>

# With tuning (auto-estimation + supervised learning + grid search)
python benchmarks/hybrid_beir.py -d <beir-data-dir> --tune

# Download BEIR datasets automatically
python benchmarks/hybrid_beir.py -d <beir-data-dir> --download

Requires pip install bayesian-bm25[scorer] sentence-transformers pytrec-eval-0.5 PyStemmer.

Sparse Retrieval

Evaluated on BEIR datasets (NFCorpus, SciFact) with k1=1.2, b=0.75, Lucene BM25. Queries are split 50/50 for training and evaluation. "Batch fit" uses gradient descent on training labels; all other Bayesian methods are unsupervised.

Ranking Quality

Base rate prior is a monotonic transform — it does not change document ordering.

Method	NFCorpus NDCG@10	NFCorpus MAP	SciFact NDCG@10	SciFact MAP
Raw BM25	0.5023	0.4395	0.5900	0.5426
Bayesian (auto)	0.5050	0.4403	0.5791	0.5283
Bayesian (auto) + base rate	0.5050	0.4403	0.5791	0.5283
Bayesian (batch fit)	0.5041	0.4400	0.5826	0.5305
Bayesian (batch fit) + base rate	0.5041	0.4400	0.5826	0.5305
Platt scaling	0.0229	0.0165	0.0000	0.0000
Min-max normalization	0.5023	0.4395	0.5900	0.5426
Batch fit (prior-aware, C2)	0.5066	0.4424	0.5776	0.5236
Batch fit (prior-free, C3)	0.5023	0.4395	0.5880	0.5389

Probability Calibration

Expected Calibration Error (ECE) and Brier score. Lower is better.

Method	NFCorpus ECE	NFCorpus Brier	SciFact ECE	SciFact Brier
Bayesian (no base rate)	0.6519	0.4667	0.7989	0.6635
Bayesian (base_rate=auto)	0.1461 (-77.6%)	0.0619	0.2577 (-67.7%)	0.1308
Bayesian (base_rate=0.001)	0.0081 (-98.8%)	0.0114	0.0354 (-95.6%)	0.0157
Batch fit (no base rate)	0.0093 (-98.6%)	0.0114	0.0103 (-98.7%)	0.0051
Batch fit + base_rate=auto	0.0085 (-98.7%)	0.0096	0.0021 (-99.7%)	0.0013
Platt scaling	0.0186 (-97.1%)	0.0101	0.0188 (-97.7%)	0.0007
Min-max normalization	0.0189 (-97.1%)	0.0105	0.0156 (-98.0%)	0.0009
Batch fit (prior-aware, C2)	0.0892 (-86.3%)	0.0439	0.1427 (-82.1%)	0.0802
Batch fit (prior-free, C3)	0.0029 (-99.6%)	0.0099	0.0058 (-99.3%)	0.0030

Threshold Transfer

F1 scores using the best threshold found on training queries, applied to evaluation queries. Smaller gap indicates better generalization.

Method	NFCorpus Train F1	NFCorpus Test F1	SciFact Train F1	SciFact Test F1
Bayesian (no base rate)	0.1607	0.1511	0.3374	0.2800
Batch fit (no base rate)	0.1577	0.1405	0.2358	0.2294
Batch fit + base_rate=auto	0.1559	0.1403	0.3316	0.3341
Platt scaling	0.0219	0.0193	0.0005	0.0005
Min-max normalization	0.1796	0.1751	0.3526	0.3486
Batch fit (prior-aware, C2)	0.1657	0.1539	0.3370	0.3275
Batch fit (prior-free, C3)	0.1808	0.1758	0.2836	0.2852

Reproduce with python benchmarks/base_rate.py (requires pip install bayesian-bm25[bench]). The base rate benchmark also includes Platt scaling, min-max normalization, and prior-aware/prior-free training mode comparisons.

Additional benchmarks (no external datasets required):

python benchmarks/learnable_weights.py — learnable weight recovery, fusion quality, online convergence, and timing
python benchmarks/weighted_fusion.py — weighted vs uniform log-odds fusion across noise scenarios
python benchmarks/wand_upper_bound.py — WAND upper bound tightness and skip rate analysis

Citation

If you use this work, please cite the following papers:

@preprint{Jeong2026BayesianBM25,
  author    = {Jeong, Jaepil},
  title     = {Bayesian {BM25}: {A} Probabilistic Framework for Hybrid Text
               and Vector Search},
  year      = {2026},
  publisher = {Zenodo},
  doi       = {10.5281/zenodo.18414940},
  url       = {https://doi.org/10.5281/zenodo.18414940}
}

@preprint{Jeong2026BayesianNeural,
  author    = {Jeong, Jaepil},
  title     = {From {Bayesian} Inference to Neural Computation: The Analytical
               Emergence of Neural Network Structure from Probabilistic
               Relevance Estimation},
  year      = {2026},
  publisher = {Zenodo},
  doi       = {10.5281/zenodo.18512411},
  url       = {https://doi.org/10.5281/zenodo.18512411}
}

License

This project is licensed under the Apache License 2.0.

Project details

These details have not been verified by PyPI

Project links

Repository

Release history Release notifications | RSS feed

0.11.0

Mar 17, 2026

0.9.0

Mar 14, 2026

This version

0.8.1

Mar 14, 2026

0.8.0

Mar 5, 2026

0.7.0

Mar 3, 2026

0.6.0

Feb 28, 2026

0.5.0

Feb 26, 2026

0.4.1

Feb 25, 2026

0.4.0

Feb 24, 2026

0.3.2

Feb 22, 2026

0.3.1

Feb 21, 2026

0.3.0

Feb 20, 2026

0.2.0

Feb 18, 2026

0.1.1

Feb 16, 2026

0.1.0

Feb 14, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bayesian_bm25-0.8.1.tar.gz (137.4 kB view details)

Uploaded Mar 14, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

bayesian_bm25-0.8.1-py3-none-any.whl (91.7 kB view details)

Uploaded Mar 14, 2026 Python 3

File details

Details for the file bayesian_bm25-0.8.1.tar.gz.

File metadata

Download URL: bayesian_bm25-0.8.1.tar.gz
Upload date: Mar 14, 2026
Size: 137.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for bayesian_bm25-0.8.1.tar.gz
Algorithm	Hash digest
SHA256	`c3a7b7a45f01b9bd1d0997a3af79898a2207867ceba43470f63f416a5b2af9ff`
MD5	`33385235255a8b7ed0a4aa46584a04b7`
BLAKE2b-256	`600da435272702ad31780250ee192594c76b9dabf998a01bf6b7ed523b64c3be`

See more details on using hashes here.

File details

Details for the file bayesian_bm25-0.8.1-py3-none-any.whl.

File metadata

Download URL: bayesian_bm25-0.8.1-py3-none-any.whl
Upload date: Mar 14, 2026
Size: 91.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for bayesian_bm25-0.8.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ce8edc66a480f340f55c46c8f109ae6b07a49dad25f1a213e3c24f43a79d9be5`
MD5	`bcf8b9f3ae719e07e4fc1c9f0abbc383`
BLAKE2b-256	`076cf67ef3cefa0ec088d698e46208de5da79044deddcee8283b71f27052d68e`

See more details on using hashes here.

bayesian-bm25 0.8.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Bayesian BM25

Overview

Adoption

Installation

Quick Start

Converting BM25 Scores to Probabilities

End-to-End Search with Probabilities

Multi-Field Search

Combining Multiple Signals

Hybrid Text + Vector Search

Learning Fusion Weights from Data

Attention-Based Fusion

WAND Pruning with Bayesian Upper Bounds

Debugging the Fusion Pipeline

Evaluating Calibration Quality

Online Learning from User Feedback

Training Modes

Benchmarks

BEIR Hybrid Search

NDCG@10

MAP@10

Recall@10

Delta vs BM25 (NDCG@10)

Sparse Retrieval

Ranking Quality

Probability Calibration

Threshold Transfer

Citation

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes