Skip to main content

Fast vector quantization with 2-4 bit compression and SIMD search

Project description

turbovec — Google's TurboQuant for vector search

License PyPI version crates.io version TurboQuant paper


A 10 million document corpus takes 31 GB of RAM as float32. turbovec fits it in 4 GB - and searches it faster than FAISS.

turbovec is a Rust vector index with Python bindings, built on Google Research's TurboQuant algorithm - a data-oblivious quantizer that matches the Shannon lower bound on distortion with zero training and zero data passes.

  • No codebook training. Add vectors, they're indexed. No data-dependent calibration, no rebuilds as the corpus grows.
  • Faster than FAISS. Hand-written NEON (ARM) and AVX-512BW (x86) kernels beat FAISS IndexPQFastScan by 12–20% on ARM and match-or-beat it on x86.
  • Filter at search time. Pass an id allowlist (or a slot bitmask) to search() and the kernel honours it directly. You always get up to k results from the allowed set — no over-fetching, no recall hit on selective filters.
  • Pure local. No managed service, no data leaving your machine or VPC. Pair with any open-source embedding model for a fully air-gapped RAG stack.

Building RAG where privacy, memory, or latency matters? You're in the right place.

Python

pip install turbovec
from turbovec import TurboQuantIndex

index = TurboQuantIndex(dim=1536, bit_width=4)
index.add(vectors)
index.add(more_vectors)

scores, indices = index.search(query, k=10)

index.write("my_index.tq")
loaded = TurboQuantIndex.load("my_index.tq")

Need stable ids that survive deletes? Use IdMapIndex:

import numpy as np
from turbovec import IdMapIndex

index = IdMapIndex(dim=1536, bit_width=4)
index.add_with_ids(vectors, np.array([1001, 1002, 1003], dtype=np.uint64))

scores, ids = index.search(query, k=10)   # ids are your uint64 external ids
index.remove(1002)                         # O(1) by id

index.write("my_index.tvim")
loaded = IdMapIndex.load("my_index.tvim")

Hybrid retrieval (filtered search)

Restrict results to a candidate set produced by another system (SQL, BM25, ACL, time window, …):

import numpy as np
from turbovec import IdMapIndex

idx = IdMapIndex(dim=1536, bit_width=4)
idx.add_with_ids(vectors, ids)

# Stage 1: external system narrows to candidate ids.
allowed = np.array(db.execute("SELECT id FROM docs WHERE tenant=?", (t,)).fetchall(),
                   dtype=np.uint64)

# Stage 2: dense rerank within the candidate set.
scores, ids = idx.search(query, k=10, allowlist=allowed)

The kernel only inserts allowed vectors into the per-query heap, so len(allowed) < k shrinks the output to len(allowed) rather than returning fewer than k valid results.

See docs/api.md for the full reference.

Framework integrations

Drop-in replacements for the in-tree reference vector / document stores in each framework. Same public surface, same persistence semantics, same retriever and pipeline wiring — swap the import and keep your pipeline.

  • LangChainpip install turbovec[langchain] · replaces langchain_core.vectorstores.InMemoryVectorStore
  • LlamaIndexpip install turbovec[llama-index] · replaces llama_index.core.vector_stores.SimpleVectorStore
  • Haystackpip install turbovec[haystack] · replaces haystack.document_stores.in_memory.InMemoryDocumentStore
  • Agnopip install turbovec[agno] · replaces agno.vectordb.lancedb.LanceDb

Rust

cargo add turbovec
use turbovec::TurboQuantIndex;

let mut index = TurboQuantIndex::new(1536, 4);
index.add(&vectors);
let results = index.search(&queries, 10);
index.write("index.tv").unwrap();
let loaded = TurboQuantIndex::load("index.tv").unwrap();

For stable external ids that survive deletes:

use turbovec::IdMapIndex;

let mut index = IdMapIndex::new(1536, 4);
index.add_with_ids(&vectors, &[1001, 1002, 1003]);
let (scores, ids) = index.search(&queries, 10);
index.remove(1002);
index.write("index.tvim").unwrap();
let loaded = IdMapIndex::load("index.tvim").unwrap();

Recall

TurboQuant vs FAISS IndexPQ (LUT256, nbits=8) — the paper's Section 4.4 baseline. 100K vectors, k=64. FAISS PQ sub-quantizer counts sized to match TurboQuant's bit rate (m=d/4 at 2-bit, m=d/2 at 4-bit).

Recall GloVe d=200

Recall d=1536

Recall d=3072

Across OpenAI d=1536 and d=3072, TurboQuant and FAISS are within 0–1 point at R@1 and both converge to 1.0 by k=4–8. GloVe d=200 is a harder regime — at low dim the asymptotic Beta assumption is looser, and our TurboQuant trails FAISS by 3–6 points at R@1 there, closing by k≈16–32.

A note on baselines. We compare against FAISS IndexPQ (LUT256, nbits=8, float32 LUT) because it's the default production-grade PQ most users would reach for. This is a stronger baseline than the custom u8-LUT PQ in the TurboQuant paper — FAISS uses a higher-precision LUT at scoring time and k-means++ for codebook training. We reproduce the paper's TurboQuant numbers on OpenAI d=1536 / d=3072 and hit similar numbers to other community reference implementations on low-dim embeddings (see turboquant-py at d=384). The visible gap on GloVe reflects FAISS being a strong baseline, not a TurboQuant implementation issue.

Full results: d=1536 2-bit, d=1536 4-bit, d=3072 2-bit, d=3072 4-bit, GloVe 2-bit, GloVe 4-bit.

Compression

Compression

Search Speed

All benchmarks: 100K vectors, 1K queries, k=64, median of 5 runs.

ARM (Apple M3 Max)

ARM Speed — Single-threaded

ARM Speed — Multi-threaded

On ARM, TurboQuant beats FAISS FastScan by 12–20% across every config.

x86 (Intel Xeon Platinum 8481C / Sapphire Rapids, 8 vCPUs)

x86 Speed — Single-threaded

x86 Speed — Multi-threaded

On x86, TurboQuant wins every 4-bit config by 1–6% and runs within ~1% of FAISS on 2-bit ST. The 2-bit MT rows (d=1536 and d=3072) are the only configs sitting slightly behind FAISS (2–4%), where the inner accumulate loop is too short for unrolling amortization to match FAISS's AVX-512 VBMI path.

How it works

Each vector is a direction on a high-dimensional hypersphere. TurboQuant compresses these directions using a simple insight: after applying a random rotation, every coordinate follows a known distribution -- regardless of the input data.

1. Normalize. Strip the length (norm) from each vector and store it as a single float. Now every vector is a unit direction on the hypersphere.

2. Random rotation. Multiply all vectors by the same random orthogonal matrix. After rotation, each coordinate independently follows a Beta distribution that converges to Gaussian N(0, 1/d) in high dimensions. This holds for any input data -- the rotation makes the coordinate distribution predictable.

3. Lloyd-Max scalar quantization. Since the distribution is known, we can precompute the optimal way to bucket each coordinate. For 2-bit, that's 4 buckets; for 4-bit, 16 buckets. The Lloyd-Max algorithm finds bucket boundaries and centroids that minimize mean squared error. These are computed once from the math, not from the data.

4. Bit-pack. Each coordinate is now a small integer (0-3 for 2-bit, 0-15 for 4-bit). Pack these tightly into bytes. A 1536-dim vector goes from 6,144 bytes (FP32) to 384 bytes (2-bit). That's 16x compression.

Search. Instead of decompressing every database vector, we rotate the query once into the same domain and score directly against the codebook values. The scoring kernel uses SIMD intrinsics (NEON on ARM, AVX-512BW on modern x86 with an AVX2 fallback) with nibble-split lookup tables for maximum throughput.

The paper proves this achieves distortion within a factor of 2.7x of the information-theoretic lower bound (Shannon's distortion-rate limit). You cannot do much better for a given number of bits.

Building

Python (via maturin)

pip install maturin
cd turbovec-python
maturin build --release
pip install target/wheels/*.whl

Rust

cargo build --release

All x86_64 builds target x86-64-v3 (AVX2 baseline, Haswell 2013+) via .cargo/config.toml. Any CPU that can run the AVX2 fallback kernel can run the whole crate — the AVX-512 kernel is gated at runtime via is_x86_feature_detected! and only kicks in on hardware that supports it.

Running benchmarks

Download datasets:

python3 benchmarks/download_data.py all            # all datasets
python3 benchmarks/download_data.py glove          # GloVe d=200
python3 benchmarks/download_data.py openai-1536    # OpenAI DBpedia d=1536
python3 benchmarks/download_data.py openai-3072    # OpenAI DBpedia d=3072

Each benchmark is a self-contained script in benchmarks/suite/. Run any one individually:

python3 benchmarks/suite/speed_d1536_2bit_arm_mt.py
python3 benchmarks/suite/recall_d1536_2bit.py
python3 benchmarks/suite/compression.py

Run all benchmarks for a category:

for f in benchmarks/suite/speed_*arm*.py; do python3 "$f"; done    # all ARM speed
for f in benchmarks/suite/speed_*x86*.py; do python3 "$f"; done    # all x86 speed
for f in benchmarks/suite/recall_*.py; do python3 "$f"; done       # all recall
python3 benchmarks/suite/compression.py                            # compression

Results are saved as JSON to benchmarks/results/. Regenerate charts:

python3 benchmarks/create_diagrams.py

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

turbovec-0.4.3.tar.gz (118.5 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

turbovec-0.4.3-cp39-abi3-win_amd64.whl (621.9 kB view details)

Uploaded CPython 3.9+Windows x86-64

turbovec-0.4.3-cp39-abi3-manylinux_2_28_x86_64.whl (806.6 kB view details)

Uploaded CPython 3.9+manylinux: glibc 2.28+ x86-64

turbovec-0.4.3-cp39-abi3-manylinux_2_28_aarch64.whl (950.6 kB view details)

Uploaded CPython 3.9+manylinux: glibc 2.28+ ARM64

turbovec-0.4.3-cp39-abi3-macosx_11_0_arm64.whl (812.6 kB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

File details

Details for the file turbovec-0.4.3.tar.gz.

File metadata

  • Download URL: turbovec-0.4.3.tar.gz
  • Upload date:
  • Size: 118.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for turbovec-0.4.3.tar.gz
Algorithm Hash digest
SHA256 096f138a766e805927f67b4c9218cbf018e515cb7a0cc1d301dd9c7ae6b48350
MD5 6d107a4d5348d5782d95f77688daa372
BLAKE2b-256 97ede74bc468cacff80a10bd43bd3df287b8b135c6ce49b878ae66e2be474d15

See more details on using hashes here.

Provenance

The following attestation bundles were made for turbovec-0.4.3.tar.gz:

Publisher: release-pypi.yml on RyanCodrai/turbovec

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file turbovec-0.4.3-cp39-abi3-win_amd64.whl.

File metadata

  • Download URL: turbovec-0.4.3-cp39-abi3-win_amd64.whl
  • Upload date:
  • Size: 621.9 kB
  • Tags: CPython 3.9+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for turbovec-0.4.3-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 9478cd701b38b4d9af1d0bbe1c041bf8c1f9eefb6d8e1affb47ed23a26f1a5bb
MD5 be357a04b6aa103663fc2c6ed4bab8a4
BLAKE2b-256 ee8070d27426dfbaf8b34d06600ef06ad7fe21d8b061c04d5f76bc7cdd85d7f7

See more details on using hashes here.

Provenance

The following attestation bundles were made for turbovec-0.4.3-cp39-abi3-win_amd64.whl:

Publisher: release-pypi.yml on RyanCodrai/turbovec

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file turbovec-0.4.3-cp39-abi3-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for turbovec-0.4.3-cp39-abi3-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 02f328f2bb5c00b0662885a370439f25bbf18e9a4f87b330cfa8489716f6bf64
MD5 69975af4daeee7158d811d3be6055775
BLAKE2b-256 fd1fee00778eccf820f00f3803d52fe96e9c1412b707e539a1edc3c90dc69935

See more details on using hashes here.

Provenance

The following attestation bundles were made for turbovec-0.4.3-cp39-abi3-manylinux_2_28_x86_64.whl:

Publisher: release-pypi.yml on RyanCodrai/turbovec

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file turbovec-0.4.3-cp39-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for turbovec-0.4.3-cp39-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 c470d858048e87adedcdbba7b77f5032948239aa325db9834186bc6a7298330f
MD5 9e000bbad29753abbfb86a2f8eea8ca5
BLAKE2b-256 8f2722e3320b19e9ef09f2d1b7d3eb3712306f9aaca26c351f5360b2df90775f

See more details on using hashes here.

Provenance

The following attestation bundles were made for turbovec-0.4.3-cp39-abi3-manylinux_2_28_aarch64.whl:

Publisher: release-pypi.yml on RyanCodrai/turbovec

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file turbovec-0.4.3-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for turbovec-0.4.3-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 89d1f79bdc7dea631b2fd357b82e7eba65777d1ad2de07e5dacfaa59c087b901
MD5 f053420e2f9e15955c233240a4146fa4
BLAKE2b-256 12aadb05ec5a40fa680ee16d8d33e8410a0666b816e33a9d747d24a0b4335bde

See more details on using hashes here.

Provenance

The following attestation bundles were made for turbovec-0.4.3-cp39-abi3-macosx_11_0_arm64.whl:

Publisher: release-pypi.yml on RyanCodrai/turbovec

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page