Skip to main content

Fast vector quantization with 2-4 bit compression and SIMD search

Project description

turbovec — Google's TurboQuant for vector search

License PyPI version crates.io version TurboQuant paper


A 10 million document corpus takes 31 GB of RAM as float32. turbovec fits it in 4 GB - and searches it faster than FAISS.

turbovec is a Rust vector index with Python bindings, built on Google Research's TurboQuant algorithm - a data-oblivious quantizer that matches the Shannon lower bound on distortion with zero training and zero data passes.

  • No codebook training. Add vectors, they're indexed. No data-dependent calibration, no rebuilds as the corpus grows.
  • Faster than FAISS. Hand-written NEON (ARM) and AVX-512BW (x86) kernels beat FAISS IndexPQFastScan by 12–20% on ARM and match-or-beat it on x86.
  • Pure local. No managed service, no data leaving your machine or VPC. Pair with any open-source embedding model for a fully air-gapped RAG stack.

Building RAG where privacy, memory, or latency matters? You're in the right place.

Python

pip install turbovec
from turbovec import TurboQuantIndex

index = TurboQuantIndex(dim=1536, bit_width=4)
index.add(vectors)
index.add(more_vectors)

scores, indices = index.search(query, k=10)

index.write("my_index.tq")
loaded = TurboQuantIndex.load("my_index.tq")

LangChain

pip install turbovec[langchain]
from langchain_huggingface import HuggingFaceEmbeddings
from turbovec.langchain import TurboQuantVectorStore

embeddings = HuggingFaceEmbeddings(model_name="BAAI/bge-base-en-v1.5")

store = TurboQuantVectorStore.from_texts(
    texts=["Document 1...", "Document 2...", "Document 3..."],
    embedding=embeddings,
    bit_width=4,
)

retriever = store.as_retriever(search_kwargs={"k": 5})

LlamaIndex

pip install turbovec[llama-index]
from llama_index.core import VectorStoreIndex, StorageContext
from turbovec.llama_index import TurboQuantVectorStore

vector_store = TurboQuantVectorStore.from_params(dim=768, bit_width=4)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)
retriever = index.as_retriever(similarity_top_k=5)

Rust

cargo add turbovec
use turbovec::TurboQuantIndex;

let mut index = TurboQuantIndex::new(1536, 4);
index.add(&vectors);
let results = index.search(&queries, 10);
index.write("index.tv").unwrap();
let loaded = TurboQuantIndex::load("index.tv").unwrap();

Recall

TurboQuant vs FAISS IndexPQFastScan (100K vectors, k=64). FAISS PQ configurations sized to match TurboQuant compression ratios.

Recall d=1536

Recall d=3072

Both libraries converge to 1.0 by k=4–8. At 4-bit, TurboQuant and FAISS both score 0.955+ at top-1 across every dataset and are within 0.001 of each other. The small differences at 2-bit top-1 (TurboQuant 0.870 vs FAISS 0.882 at d=1536; TurboQuant 0.912 vs FAISS 0.903 at d=3072) reflect how each method behaves at the most aggressive end of the compression curve — they disappear once k ≥ 2. Full results: d=1536 2-bit, d=1536 4-bit, d=3072 2-bit, d=3072 4-bit, GloVe 2-bit, GloVe 4-bit.

No FAISS FastScan comparison for GloVe d=200 (dimension not compatible with FastScan's m%32 requirement).

Compression

Compression

Search Speed

All benchmarks: 100K vectors, 1K queries, k=64, median of 5 runs.

ARM (Apple M3 Max)

ARM Speed — Single-threaded

ARM Speed — Multi-threaded

On ARM, TurboQuant beats FAISS FastScan by 12–20% across every config.

x86 (Intel Xeon Platinum 8481C / Sapphire Rapids, 8 vCPUs)

x86 Speed — Single-threaded

x86 Speed — Multi-threaded

On x86, TurboQuant wins every 4-bit config by 1–6% and runs within ~1% of FAISS on 2-bit ST. The 2-bit MT rows (d=1536 and d=3072) are the only configs sitting slightly behind FAISS (2–4%), where the inner accumulate loop is too short for unrolling amortization to match FAISS's AVX-512 VBMI path.

How it works

Each vector is a direction on a high-dimensional hypersphere. TurboQuant compresses these directions using a simple insight: after applying a random rotation, every coordinate follows a known distribution -- regardless of the input data.

1. Normalize. Strip the length (norm) from each vector and store it as a single float. Now every vector is a unit direction on the hypersphere.

2. Random rotation. Multiply all vectors by the same random orthogonal matrix. After rotation, each coordinate independently follows a Beta distribution that converges to Gaussian N(0, 1/d) in high dimensions. This holds for any input data -- the rotation makes the coordinate distribution predictable.

3. Lloyd-Max scalar quantization. Since the distribution is known, we can precompute the optimal way to bucket each coordinate. For 2-bit, that's 4 buckets; for 4-bit, 16 buckets. The Lloyd-Max algorithm finds bucket boundaries and centroids that minimize mean squared error. These are computed once from the math, not from the data.

4. Bit-pack. Each coordinate is now a small integer (0-3 for 2-bit, 0-15 for 4-bit). Pack these tightly into bytes. A 1536-dim vector goes from 6,144 bytes (FP32) to 384 bytes (2-bit). That's 16x compression.

Search. Instead of decompressing every database vector, we rotate the query once into the same domain and score directly against the codebook values. The scoring kernel uses SIMD intrinsics (NEON on ARM, AVX-512BW on modern x86 with an AVX2 fallback) with nibble-split lookup tables for maximum throughput.

The paper proves this achieves distortion within a factor of 2.7x of the information-theoretic lower bound (Shannon's distortion-rate limit). You cannot do much better for a given number of bits.

Building

Python (via maturin)

pip install maturin
cd turbovec-python
maturin build --release
pip install target/wheels/*.whl

Rust

cargo build --release

All x86_64 builds target x86-64-v3 (AVX2 baseline, Haswell 2013+) via .cargo/config.toml. Any CPU that can run the AVX2 fallback kernel can run the whole crate — the AVX-512 kernel is gated at runtime via is_x86_feature_detected! and only kicks in on hardware that supports it.

Running benchmarks

Download datasets:

python3 benchmarks/download_data.py all            # all datasets
python3 benchmarks/download_data.py glove          # GloVe d=200
python3 benchmarks/download_data.py openai-1536    # OpenAI DBpedia d=1536
python3 benchmarks/download_data.py openai-3072    # OpenAI DBpedia d=3072

Each benchmark is a self-contained script in benchmarks/suite/. Run any one individually:

python3 benchmarks/suite/speed_d1536_2bit_arm_mt.py
python3 benchmarks/suite/recall_d1536_2bit.py
python3 benchmarks/suite/compression.py

Run all benchmarks for a category:

for f in benchmarks/suite/speed_*arm*.py; do python3 "$f"; done    # all ARM speed
for f in benchmarks/suite/speed_*x86*.py; do python3 "$f"; done    # all x86 speed
for f in benchmarks/suite/recall_*.py; do python3 "$f"; done       # all recall
python3 benchmarks/suite/compression.py                            # compression

Results are saved as JSON to benchmarks/results/. Regenerate charts:

python3 benchmarks/create_diagrams.py

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

turbovec-0.2.0.tar.gz (50.0 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

turbovec-0.2.0-cp39-abi3-manylinux_2_28_x86_64.whl (761.2 kB view details)

Uploaded CPython 3.9+manylinux: glibc 2.28+ x86-64

turbovec-0.2.0-cp39-abi3-manylinux_2_28_aarch64.whl (904.6 kB view details)

Uploaded CPython 3.9+manylinux: glibc 2.28+ ARM64

turbovec-0.2.0-cp39-abi3-macosx_11_0_arm64.whl (766.1 kB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

File details

Details for the file turbovec-0.2.0.tar.gz.

File metadata

  • Download URL: turbovec-0.2.0.tar.gz
  • Upload date:
  • Size: 50.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for turbovec-0.2.0.tar.gz
Algorithm Hash digest
SHA256 e55a7d740aa97f675b50d5b96d06495dc7a66c65eb1b864ff8beb612717c3b07
MD5 df351e2e5438e79b43f06ddec80c014b
BLAKE2b-256 9ba0b0ccb56c087babf63eee8d3b0a001b4645afea67f2bb13ce4ec61b44eb2a

See more details on using hashes here.

Provenance

The following attestation bundles were made for turbovec-0.2.0.tar.gz:

Publisher: release-pypi.yml on RyanCodrai/turbovec

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file turbovec-0.2.0-cp39-abi3-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for turbovec-0.2.0-cp39-abi3-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 17fc1feead52bbf1465c107d3e138b96a0906940266105938e9a505d6df5b7c3
MD5 d0f23965d5e419db19c6159be66d4cf9
BLAKE2b-256 2c42d5a9118e6d1b07787a6b259d2eeb69cc92d9e409e3458b65e32ff919faee

See more details on using hashes here.

Provenance

The following attestation bundles were made for turbovec-0.2.0-cp39-abi3-manylinux_2_28_x86_64.whl:

Publisher: release-pypi.yml on RyanCodrai/turbovec

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file turbovec-0.2.0-cp39-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for turbovec-0.2.0-cp39-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 d895a64ccf88a6be01a4a5ae3b4c8447c3e9dbbd4ab54ff34afcb6032539ef5d
MD5 b840b9eed0475e3633ce31bc8909e387
BLAKE2b-256 d7621478aad8f2cb515972549e580a6f59cca7ff6814e3cf0a61407b4b2c2078

See more details on using hashes here.

Provenance

The following attestation bundles were made for turbovec-0.2.0-cp39-abi3-manylinux_2_28_aarch64.whl:

Publisher: release-pypi.yml on RyanCodrai/turbovec

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file turbovec-0.2.0-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for turbovec-0.2.0-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 a3ff9314ae1d81d7f8eba43b1729486be50e8f3229507ccba367b5d056c83c76
MD5 98ba8830d59bf97a11402cfcbed8c81d
BLAKE2b-256 8052bf780598c09633ce117c1f52bea080c50c221a4f35e1c5833d5aa5497a6a

See more details on using hashes here.

Provenance

The following attestation bundles were made for turbovec-0.2.0-cp39-abi3-macosx_11_0_arm64.whl:

Publisher: release-pypi.yml on RyanCodrai/turbovec

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page