How much retrieval quality do you keep per byte? A reproducible benchmark for embedding compression.
Project description
BitBudget
How much retrieval quality do you keep per byte?
BitBudget is a small, reproducible benchmark for embedding compression. Give it an embedder and a corpus and it reports the retrieval quality (nDCG@10, recall@10) that each compression method retains against the bytes it stores per vector — the recall‑per‑byte frontier that every RAG and vector‑database deployment actually lives on.
It is the companion benchmark to the survey “Projection and Quantisation: A Unifying View of Learning to Hash, from Random Projections to the RAG Era” and exists to answer one question that today is mostly answered by vendor blog posts: when you binarise / int8 / RaBitQ / product‑quantise / Matryoshka‑truncate your embeddings, what do you actually lose?
The headline finding
Bits beat dimensions. Spending a fixed byte budget on more coarsely quantised coordinates beats spending it on fewer full‑precision coordinates, at every budget and for every embedder we have tried. One‑bit codes with a cheap re‑ranking pass are 32× smaller than float at no measurable loss.
mxbai‑embed‑large (1024‑d), mean over 4 BEIR corpora
binary+rerank 128 B nDCG 0.509 100% of float ← 32× smaller, lossless
pq 128 B nDCG 0.488 96%
rabitq 128 B nDCG 0.487 96%
matryoshka 1024 B nDCG 0.439 86% ← 4× smaller, projection axis
float32 4096 B nDCG 0.508 100%
See LEADERBOARD.md for the full table.
Install
pip install bitbudget # evaluation only (numpy)
pip install "bitbudget[all]" # + sentence-transformers (embedding) + faiss
Quickstart
bitbudget methods # list compression methods
bitbudget run --embedder mxbai --corpus scifact # embed + evaluate, print a results card
bitbudget leaderboard results/card_*.json # render a markdown leaderboard
bitbudget indexes # list indexes (organisation axis)
bitbudget bench-index --synthetic 100000 128 # recall vs QPS vs bytes: flat/hnsw/ivfpq/bittrie
run embeds (torch) and evaluates (numpy) in one process. The corpora auto‑download.
The organisation axis (bench-index)
The compression leaderboard answers quality per byte; bench-index answers the orthogonal
recall per query-second. It builds an index over the document vectors and reports recall@k,
throughput (QPS) and bytes per vector, so HNSW and IVF‑PQ (which buy throughput and add bytes)
can be compared against compact‑code indexes on one frontier. Run it on synthetic data, on a
cached embedding (--embedder mxbai --corpus scifact), or on your own vectors (--npz). The
faiss‑backed indexes need pip install bitbudget[faiss]; the numpy bittrie runs without it.
The bittrie index ships a small C kernel (_bittrie.c) for the query hot‑path, compiled on
first use and cached (no compiler needed to install — the wheel stays pure‑Python, and it falls
back to numpy if no compiler is present). It builds multithreaded when OpenMP is available
(GCC/clang on Linux, Homebrew libomp on macOS) and single‑threaded otherwise; results are
bit‑identical to the numpy path, and recall/footprint are algorithmic and unchanged either way.
Because faiss carries its own OpenMP runtime, it cannot share a process with the bit‑trie's
libomp on macOS. bench-index therefore runs the faiss indexes and the bit‑trie in separate
subprocesses and merges the results, so a single bitbudget bench-index ... works everywhere
(pass --no-split to force one process, e.g. on Linux where both share one OpenMP runtime).
macOS note. torch and faiss each bundle their own OpenMP runtime and crash if imported in the same process. The core methods are numpy‑only, so
runis safe; if you add a faiss‑backed method, runbitbudget embed(torch) andbitbudget eval(numpy/faiss) as separate processes.
The protocol (frozen, so results are comparable)
- Corpora: the BEIR subsets
scifact,nfcorpus,arguana,fiqa(small enough to run on a laptop, diverse enough to be honest). Numbers are the mean over corpora;±is the standard deviation across them. - Metrics:
nDCG@10against the graded BEIR judgements, andrecall@10against the exact floating‑point neighbours.% of floatis nDCG relative to the uncompressed embedding. - Memory: bytes stored per document vector (
4Dfloat,Dint8,D/8binary,Mfor anM‑byte product code,4·dimfor a truncated/PCA‑reduced vector). - Embedders:
minilm(384‑d) andmxbai(1024‑d, Matryoshka) ship built in.
Add your method in five lines
This is the point of the benchmark: drop in your compressor and it is scored against every built‑in on the same protocol.
from bitbudget import method
import numpy as np
@method("my-2bit", bits=2)
def my_2bit(demb, qemb):
codes = my_quantise(demb) # your compression
scores = qemb @ my_reconstruct(codes).T # (queries x docs) similarity
return scores, demb.shape[1] * 2 / 8 # scores, bytes per stored vector
bitbudget run --embedder mxbai --corpus scifact --methods my-2bit binary+rerank float32
Then open a pull request adding your row to LEADERBOARD.md. See CONTRIBUTING.md.
Cite
If BitBudget helps your work, please cite the survey:
@article{moran2025projection,
title = {Projection and Quantisation: A Unifying View of Learning to Hash,
from Random Projections to the RAG Era},
author = {Moran, Sean},
journal = {arXiv preprint arXiv:2510.04127},
year = {2025}
}
MIT licensed.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bitbudget-0.1.0.tar.gz.
File metadata
- Download URL: bitbudget-0.1.0.tar.gz
- Upload date:
- Size: 24.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
387338d32d0ff4fd06829f068cf3a45566956618e234d0c21f518dcf7ce381ef
|
|
| MD5 |
6cf0cf0c9736985950e809a6a336c20b
|
|
| BLAKE2b-256 |
30e27f2b46eb09590efed9f0a02592859d7358bad5c221912949bd506ac0b5d2
|
Provenance
The following attestation bundles were made for bitbudget-0.1.0.tar.gz:
Publisher:
publish.yml on sjmoran/bitbudget
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
bitbudget-0.1.0.tar.gz -
Subject digest:
387338d32d0ff4fd06829f068cf3a45566956618e234d0c21f518dcf7ce381ef - Sigstore transparency entry: 1739801042
- Sigstore integration time:
-
Permalink:
sjmoran/bitbudget@b9571531f8df3704b712e92f680f2359387bdf23 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/sjmoran
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@b9571531f8df3704b712e92f680f2359387bdf23 -
Trigger Event:
release
-
Statement type:
File details
Details for the file bitbudget-0.1.0-py3-none-any.whl.
File metadata
- Download URL: bitbudget-0.1.0-py3-none-any.whl
- Upload date:
- Size: 23.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b96807ce37c6fbdf63886ac77ef612a7a91249cdcab627d950316c9d8b6c9c2e
|
|
| MD5 |
e79c8b3d3dd19c3d67b0d32fe24555a9
|
|
| BLAKE2b-256 |
43ae52c26897b4e4adc0f45adaedeacd1f7e405f2b6926a72efd7e7957388149
|
Provenance
The following attestation bundles were made for bitbudget-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on sjmoran/bitbudget
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
bitbudget-0.1.0-py3-none-any.whl -
Subject digest:
b96807ce37c6fbdf63886ac77ef612a7a91249cdcab627d950316c9d8b6c9c2e - Sigstore transparency entry: 1739801082
- Sigstore integration time:
-
Permalink:
sjmoran/bitbudget@b9571531f8df3704b712e92f680f2359387bdf23 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/sjmoran
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@b9571531f8df3704b712e92f680f2359387bdf23 -
Trigger Event:
release
-
Statement type: