Skip to main content

How much retrieval quality do you keep per byte? A reproducible benchmark for embedding compression.

Project description

BitBudget

How much retrieval quality do you keep per byte?

BitBudget is a small, reproducible benchmark for embedding compression. Give it an embedder and a corpus and it reports the retrieval quality (nDCG@10, recall@10) that each compression method retains against the bytes it stores per vector — the recall‑per‑byte frontier that every RAG and vector‑database deployment actually lives on.

It is the companion benchmark to the survey “Projection and Quantisation: A Unifying View of Learning to Hash, from Random Projections to the RAG Era” and exists to answer one question that today is mostly answered by vendor blog posts: when you binarise / int8 / RaBitQ / product‑quantise / Matryoshka‑truncate your embeddings, what do you actually lose?

The headline finding

Bits beat dimensions. Spending a fixed byte budget on more coarsely quantised coordinates beats spending it on fewer full‑precision coordinates, at every budget and for every embedder we have tried. One‑bit codes with a cheap re‑ranking pass are 32× smaller than float at no measurable loss.

mxbai‑embed‑large (1024‑d), mean over 4 BEIR corpora
  binary+rerank      128 B   nDCG 0.509   100% of float   ← 32× smaller, lossless
  pq                 128 B   nDCG 0.488    96%
  rabitq             128 B   nDCG 0.487    96%
  matryoshka        1024 B   nDCG 0.439    86%             ← 4× smaller, projection axis
  float32           4096 B   nDCG 0.508   100%

See LEADERBOARD.md for the full table.

Install

pip install bitbudget            # evaluation only (numpy)
pip install "bitbudget[all]"     # + sentence-transformers (embedding) + faiss

Quickstart

bitbudget methods                                   # list compression methods
bitbudget run --embedder mxbai --corpus scifact     # embed + evaluate, print a results card
bitbudget leaderboard results/card_*.json           # render a markdown leaderboard

bitbudget indexes                                   # list indexes (organisation axis)
bitbudget bench-index --synthetic 100000 128        # recall vs QPS vs bytes: flat/hnsw/ivfpq/bittrie

run embeds (torch) and evaluates (numpy) in one process. The corpora auto‑download.

The organisation axis (bench-index)

The compression leaderboard answers quality per byte; bench-index answers the orthogonal recall per query-second. It builds an index over the document vectors and reports recall@k, throughput (QPS) and bytes per vector, so HNSW and IVF‑PQ (which buy throughput and add bytes) can be compared against compact‑code indexes on one frontier. Run it on synthetic data, on a cached embedding (--embedder mxbai --corpus scifact), or on your own vectors (--npz). The faiss‑backed indexes need pip install bitbudget[faiss]; the numpy bittrie runs without it.

The bittrie index ships a small C kernel (_bittrie.c) for the query hot‑path, compiled on first use and cached (no compiler needed to install — the wheel stays pure‑Python, and it falls back to numpy if no compiler is present). It builds multithreaded when OpenMP is available (GCC/clang on Linux, Homebrew libomp on macOS) and single‑threaded otherwise; results are bit‑identical to the numpy path, and recall/footprint are algorithmic and unchanged either way.

Because faiss carries its own OpenMP runtime, it cannot share a process with the bit‑trie's libomp on macOS. bench-index therefore runs the faiss indexes and the bit‑trie in separate subprocesses and merges the results, so a single bitbudget bench-index ... works everywhere (pass --no-split to force one process, e.g. on Linux where both share one OpenMP runtime).

macOS note. torch and faiss each bundle their own OpenMP runtime and crash if imported in the same process. The core methods are numpy‑only, so run is safe; if you add a faiss‑backed method, run bitbudget embed (torch) and bitbudget eval (numpy/faiss) as separate processes.

The protocol (frozen, so results are comparable)

  • Corpora: the BEIR subsets scifact, nfcorpus, arguana, fiqa (small enough to run on a laptop, diverse enough to be honest). Numbers are the mean over corpora; ± is the standard deviation across them.
  • Metrics: nDCG@10 against the graded BEIR judgements, and recall@10 against the exact floating‑point neighbours. % of float is nDCG relative to the uncompressed embedding.
  • Memory: bytes stored per document vector (4D float, D int8, D/8 binary, M for an M‑byte product code, 4·dim for a truncated/PCA‑reduced vector).
  • Embedders: minilm (384‑d) and mxbai (1024‑d, Matryoshka) ship built in.

Add your method in five lines

This is the point of the benchmark: drop in your compressor and it is scored against every built‑in on the same protocol.

from bitbudget import method
import numpy as np

@method("my-2bit", bits=2)
def my_2bit(demb, qemb):
    codes = my_quantise(demb)                       # your compression
    scores = qemb @ my_reconstruct(codes).T         # (queries x docs) similarity
    return scores, demb.shape[1] * 2 / 8            # scores, bytes per stored vector
bitbudget run --embedder mxbai --corpus scifact --methods my-2bit binary+rerank float32

Then open a pull request adding your row to LEADERBOARD.md. See CONTRIBUTING.md.

Cite

If BitBudget helps your work, please cite the survey:

@article{moran2025projection,
  title   = {Projection and Quantisation: A Unifying View of Learning to Hash,
             from Random Projections to the RAG Era},
  author  = {Moran, Sean},
  journal = {arXiv preprint arXiv:2510.04127},
  year    = {2025}
}

MIT licensed.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bitbudget-0.1.0.tar.gz (24.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bitbudget-0.1.0-py3-none-any.whl (23.7 kB view details)

Uploaded Python 3

File details

Details for the file bitbudget-0.1.0.tar.gz.

File metadata

  • Download URL: bitbudget-0.1.0.tar.gz
  • Upload date:
  • Size: 24.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for bitbudget-0.1.0.tar.gz
Algorithm Hash digest
SHA256 387338d32d0ff4fd06829f068cf3a45566956618e234d0c21f518dcf7ce381ef
MD5 6cf0cf0c9736985950e809a6a336c20b
BLAKE2b-256 30e27f2b46eb09590efed9f0a02592859d7358bad5c221912949bd506ac0b5d2

See more details on using hashes here.

Provenance

The following attestation bundles were made for bitbudget-0.1.0.tar.gz:

Publisher: publish.yml on sjmoran/bitbudget

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file bitbudget-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: bitbudget-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 23.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for bitbudget-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b96807ce37c6fbdf63886ac77ef612a7a91249cdcab627d950316c9d8b6c9c2e
MD5 e79c8b3d3dd19c3d67b0d32fe24555a9
BLAKE2b-256 43ae52c26897b4e4adc0f45adaedeacd1f7e405f2b6926a72efd7e7957388149

See more details on using hashes here.

Provenance

The following attestation bundles were made for bitbudget-0.1.0-py3-none-any.whl:

Publisher: publish.yml on sjmoran/bitbudget

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page