Skip to main content

Fast vector quantization with 2-4 bit compression and SIMD search

Project description

turbovec

A vector index built on TurboQuant, written in Rust with Python bindings.

PyPI crates.io MIT License TurboQuant paper


Fast vector index in Rust with Python bindings. Compresses vectors to 2-4 bits per dimension using TurboQuant (Google Research, ICLR 2026) with near-optimal distortion.

Unlike trained methods like FAISS PQ, TurboQuant is data-oblivious — no training step, no codebook retraining when data changes, and new vectors can be added at any time. This means faster index creation, simpler infrastructure, and comparable or higher recall.

Python

pip install turbovec
from turbovec import TurboQuantIndex

index = TurboQuantIndex(dim=1536, bit_width=4)
index.add(vectors)
index.add(more_vectors)

scores, indices = index.search(query, k=10)

index.write("my_index.tq")
loaded = TurboQuantIndex.load("my_index.tq")

Rust

cargo add turbovec
use turbovec::TurboQuantIndex;

let mut index = TurboQuantIndex::new(1536, 4);
index.add(&vectors);
let results = index.search(&queries, 10);
index.write("index.tv").unwrap();
let loaded = TurboQuantIndex::load("index.tv").unwrap();

Recall

TurboQuant vs FAISS IndexPQFastScan (100K vectors, k=64). FAISS PQ configurations sized to match TurboQuant compression ratios.

Recall d=1536

Recall d=3072

Both converge to 1.0 by k=4-8. At d=3072 2-bit, TurboQuant recall exceeds FAISS (0.912 vs 0.903). At d=1536 2-bit, FAISS is slightly ahead (0.882 vs 0.870). The recall discrepancy between TQ and FAISS varies by dimension and bit width — this requires further investigation. Full results: d=1536 2-bit, d=1536 4-bit, d=3072 2-bit, d=3072 4-bit, GloVe 2-bit, GloVe 4-bit.

No FAISS FastScan comparison for GloVe d=200 (dimension not compatible with FastScan's m%32 requirement).

Compression

Compression

Search Speed

All benchmarks: 100K vectors, 1K queries, k=64, median of 5 runs.

ARM (Apple M3 Max)

ARM Speed — Single-threaded

ARM Speed — Multi-threaded

On ARM, TurboQuant beats FAISS FastScan by 12–20% across every config.

x86 (Intel Xeon Platinum 8481C / Sapphire Rapids, 8 vCPUs)

x86 Speed — Single-threaded

x86 Speed — Multi-threaded

On x86, TurboQuant wins every 4-bit config by 1–6% and runs within ~1% of FAISS on 2-bit ST. The 2-bit MT rows (d=1536 and d=3072) are the last configs sitting slightly behind FAISS (2–4%), where the inner accumulate loop is too short for unrolling amortization to catch up with FAISS's AVX-512 VBMI path. See the Performance build section below for an opt-in PGO recipe that flips every config to a win.

How it works

Each vector is a direction on a high-dimensional hypersphere. TurboQuant compresses these directions using a simple insight: after applying a random rotation, every coordinate follows a known distribution -- regardless of the input data.

1. Normalize. Strip the length (norm) from each vector and store it as a single float. Now every vector is a unit direction on the hypersphere.

2. Random rotation. Multiply all vectors by the same random orthogonal matrix. After rotation, each coordinate independently follows a Beta distribution that converges to Gaussian N(0, 1/d) in high dimensions. This holds for any input data -- the rotation makes the coordinate distribution predictable.

3. Lloyd-Max scalar quantization. Since the distribution is known, we can precompute the optimal way to bucket each coordinate. For 2-bit, that's 4 buckets; for 4-bit, 16 buckets. The Lloyd-Max algorithm finds bucket boundaries and centroids that minimize mean squared error. These are computed once from the math, not from the data.

4. Bit-pack. Each coordinate is now a small integer (0-3 for 2-bit, 0-15 for 4-bit). Pack these tightly into bytes. A 1536-dim vector goes from 6,144 bytes (FP32) to 384 bytes (2-bit). That's 16x compression.

Search. Instead of decompressing every database vector, we rotate the query once into the same domain and score directly against the codebook values. The scoring kernel uses SIMD intrinsics (NEON on ARM, AVX2 on x86) with nibble-split lookup tables for maximum throughput.

The paper proves this achieves distortion within a factor of 2.7x of the information-theoretic lower bound (Shannon's distortion-rate limit). You cannot do much better for a given number of bits.

Building

Python (via maturin)

pip install maturin
cd turbovec-python
maturin build --release
pip install target/wheels/*.whl

Rust

cargo build --release

All x86_64 builds target x86-64-v3 (AVX2 baseline, Haswell 2013+) via .cargo/config.toml. Any CPU that can run the AVX2 fallback kernel can run the whole crate — the AVX-512 kernel is gated at runtime via is_x86_feature_detected! and only kicks in on hardware that supports it.

Performance build (x86)

The shipped Cargo config already gives most of the x86 win out of the box. For an additional ~5–10% on modern servers (Ice Lake / Sapphire Rapids / Zen 4+), you can opt into a profile-guided build with host-specific codegen. This flips every x86 config from parity-or-slight-win to a clear win across the board.

cd turbovec-python

# 1. Instrumented build
RUSTFLAGS="-C profile-generate=/tmp/pgo -C target-cpu=native" maturin build --release
pip install --force-reinstall target/wheels/*.whl

# 2. Collect a profile by running representative benchmarks
python3 ../benchmarks/suite/speed_d3072_4bit_x86_st.py
python3 ../benchmarks/suite/speed_d1536_4bit_x86_st.py

# 3. Merge and rebuild with the profile
llvm-profdata merge -o /tmp/merged.profdata /tmp/pgo
RUSTFLAGS="-C profile-use=/tmp/merged.profdata -C target-cpu=native" maturin build --release
pip install --force-reinstall target/wheels/*.whl

target-cpu=native is machine-specific — only use it when building on the same host (family) you'll run on. llvm-profdata ships with rustup component add llvm-tools-preview.

Running benchmarks

Download datasets:

python3 benchmarks/download_data.py all            # all datasets
python3 benchmarks/download_data.py glove          # GloVe d=200
python3 benchmarks/download_data.py openai-1536    # OpenAI DBpedia d=1536
python3 benchmarks/download_data.py openai-3072    # OpenAI DBpedia d=3072

Each benchmark is a self-contained script in benchmarks/suite/. Run any one individually:

python3 benchmarks/suite/speed_d1536_2bit_arm_mt.py
python3 benchmarks/suite/recall_d1536_2bit.py
python3 benchmarks/suite/compression.py

Run all benchmarks for a category:

for f in benchmarks/suite/speed_*arm*.py; do python3 "$f"; done    # all ARM speed
for f in benchmarks/suite/speed_*x86*.py; do python3 "$f"; done    # all x86 speed
for f in benchmarks/suite/recall_*.py; do python3 "$f"; done       # all recall
python3 benchmarks/suite/compression.py                            # compression

Results are saved as JSON to benchmarks/results/. Regenerate charts:

python3 benchmarks/create_diagrams.py

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

turbovec-0.1.3.tar.gz (44.3 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

turbovec-0.1.3-cp39-abi3-manylinux_2_28_x86_64.whl (755.7 kB view details)

Uploaded CPython 3.9+manylinux: glibc 2.28+ x86-64

turbovec-0.1.3-cp39-abi3-manylinux_2_28_aarch64.whl (900.8 kB view details)

Uploaded CPython 3.9+manylinux: glibc 2.28+ ARM64

turbovec-0.1.3-cp39-abi3-macosx_11_0_arm64.whl (761.4 kB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

File details

Details for the file turbovec-0.1.3.tar.gz.

File metadata

  • Download URL: turbovec-0.1.3.tar.gz
  • Upload date:
  • Size: 44.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for turbovec-0.1.3.tar.gz
Algorithm Hash digest
SHA256 140a433438f102e17947875a231f8cce50fd79b5dc381672f6510cd346cfb0d1
MD5 808e7066c3796b04bc822940462c19a6
BLAKE2b-256 b505f9c8de1eea79a69c43b0760a026e34cbeeda063038bbd9b99425d897c902

See more details on using hashes here.

Provenance

The following attestation bundles were made for turbovec-0.1.3.tar.gz:

Publisher: release.yml on RyanCodrai/turbovec

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file turbovec-0.1.3-cp39-abi3-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for turbovec-0.1.3-cp39-abi3-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 c14e19bdbb6e833505fb1bc8eee2791bee99355d045c82e6f350f6b39f7663c5
MD5 66776a211d460647f8b6aeffeb43412f
BLAKE2b-256 6ace5406ccb0efa79bf01ba9aa965262003a6c772cf09ceb940023491df0105f

See more details on using hashes here.

Provenance

The following attestation bundles were made for turbovec-0.1.3-cp39-abi3-manylinux_2_28_x86_64.whl:

Publisher: release.yml on RyanCodrai/turbovec

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file turbovec-0.1.3-cp39-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for turbovec-0.1.3-cp39-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 21b1b77113cfabb9e023b6b922ecd97e3098ec3665395234a1c581a93fe76347
MD5 f67f290e06fdd9eaafce26a4f6a84285
BLAKE2b-256 ea3e2240c4401bdce3463d517ba652a7aa31f794bc30b10985694572ef2c30d9

See more details on using hashes here.

Provenance

The following attestation bundles were made for turbovec-0.1.3-cp39-abi3-manylinux_2_28_aarch64.whl:

Publisher: release.yml on RyanCodrai/turbovec

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file turbovec-0.1.3-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for turbovec-0.1.3-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 34e50b9448759933b67136ccc7ab2cedfedcd4b4ae41441b94c93d16274fd686
MD5 126841e6e2bef67fc2766976fd397f28
BLAKE2b-256 b7ec3bc9e28852b6c2a5d2a9e88048c5bca0363e641e3956f4d1f9dc6eecc7a2

See more details on using hashes here.

Provenance

The following attestation bundles were made for turbovec-0.1.3-cp39-abi3-macosx_11_0_arm64.whl:

Publisher: release.yml on RyanCodrai/turbovec

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page