Drop-in vector compression for ChromaDB: 4x less RAM, <1% recall loss.

These details have not been verified by PyPI

Project links

Project description

turbochroma

High-performance vector compression for ChromaDB: 4× less RAM, <1% recall loss, zero ingest-code changes.

turbochroma solves the high RAM consumption problem in ChromaDB as collections grow. Instead of migrating to a more complex vector database (like Qdrant or Milvus), it allows you to scale your RAG (Retrieval-Augmented Generation) applications with:

Reduce RAM usage by 4×: Stores compressed (SQ8 - 8-bit) vectors directly in metadata.
Search faster with ADC: Uses Asymmetric Distance Computation (ADC) to re-order candidates without fully decompressing vectors.
Save VRAM for LLMs/Rerankers: By performing high-quality re-ranking on the CPU with ADC, you can send fewer but more relevant candidates to your GPU-based cross-encoders or LLMs, significantly reducing VRAM pressure and costs.
Maintain precision: Implements a "Sparse Rotation" step before quantization to minimize information loss (typically <1% recall loss).

Status: 0.1.0 — first PyPI (beta) line. The public surface (QuantizedCollection, SQ8Codec, metadata keys) is expected to stay compatible within 0.1.x; see STABILITY.md and CHANGELOG.md. See also CODE_OF_CONDUCT.md.

Install from PyPI: pip install turbochroma

Why turbochroma for RAG?

ChromaDB does not ship native vector quantization. If your RAG collection grows past what your RAM can comfortably hold, your options today are:

Option	Cost
Migrate to Qdrant / Milvus / Weaviate	Infra rewrite, new ops surface
Reduce embedding dimension (e.g. PCA)	Model retraining, recall loss across the board
`pip install turbochroma`	Small code change, no vector DB swap

RAG & LLM Integration Patterns

Pattern	How turbochroma helps	LLM / User Benefit
High-Precision RAG	Use ADC to over-fetch (e.g., top-40 instead of top-10) and re-rank accurately on CPU.	Better context quality for the LLM without increasing vector DB memory.
VRAM-Optimized Pipeline	Filter thousands of candidates on CPU via ADC before hitting GPU models.	Lower VRAM usage on GPUs; allows running larger LLMs on the same hardware.
Multi-Tenant LLM Apps	4× less RAM per collection allows hosting hundreds of tenant-specific indexes on a single small instance.	Lower infrastructure costs for SaaS applications.
Cheap Pre-Ranking	Act as a middle layer: Chroma (approx) → TurboChroma (ADC re-rank) → Cross-Encoder (heavy).	Reduces the number of hits passed to expensive cross-encoders, saving tokens and GPU latency.
Legal/Medical Search	Sparse Rotation preserves outliers and specific terminology better than naive SQ8.	Maintains high recall for specialized domains where every chunk matters.

What it is not (primarily): a replacement for billion-scale FAISS-IVF-PQ clusters. It is a pragmatic scaling layer for production RAG teams already using Chroma.

Limitations (read before you ship)

Re-rank cannot rescue misses: If the correct chunk is not in Chroma’s top (n_results × refine_factor) hits, ADC cannot invent it. Tune n_results and refine_factor to your recall needs.
ADC refinement with refine_factor > 1 applies only to query_embeddings=.... If you only pass query_texts (and let Chroma embed), the wrapper falls back to native Chroma order and may emit a UserWarning.
Chroma’s query(..., include=...) does not allow "ids"; IDs are always returned. The wrapper strips "ids" from include before calling Chroma.
The default SQ8 path stores one byte per dimension in the blob, with a hard cap of MAX_COMPRESSED_BLOB_BYTES (1 MiB) on both codec dimension and decoded payload size. If you need larger vectors, open an issue (you would need a different storage layout or a raised limit).
Blobs are stored as base64 in metadata (Chroma’s accepted types). You pay some storage overhead on top of raw int8; later releases may add sidecar storage for tighter layouts. Values are size-checked before decoding. A second field (DefaultBlobspecKey / tc_blobspec_v1 by default) stores a codec fingerprint (BaseCodec.blobspec_fingerprint) so ADC can detect a blob written with a different codec, dimension, rotation, or seed. If the field is missing (older rows), only the base64 is validated. Set blobspec_key=None on QuantizedCollection to disable writing and checking that field. For integrity-sensitive re-ranking, use strict=True on QuantizedCollection or on query(...) so a bad blob or mismatched fingerprint fails with ValueError instead of falling back to Chroma’s distance.
Format upgrades: a future incompatible change to the stored metadata layout (outside 0.1.x or as documented) may require re-backfill or re-index; see STABILITY.md.
Non-cryptographic storage: this library does not encrypt collections. Anyone with read access to the Chroma collection can read stored blobs and vectors per your Chroma config; for very sensitive use cases, enforce access at the DB / product level — see SECURITY.md.

Context and trade-offs: docs/design/001-why-turbochroma.md.

Installation

pip install turbochroma

Develop from a git clone (editable):

cd turbochroma
python -m venv .venv
# Windows: .\.venv\Scripts\activate
# Unix:     source .venv/bin/activate
pip install -e ".[dev]"

Contributors: CONTRIBUTING.md (pre-commit, ruff, mypy, pip-audit). Quality bar: QUALITY.md, docs/quality-gates.md, TESTING.md. Releases: RELEASING.md (tag v* → PyPI via Trusted Publishing). API & design site: pip install -e ".[docs]" && mkdocs build (sources under docs/).

Optional extras:

turbochroma[fast] — numba kernels for faster ADC
turbochroma[parquet] — sidecar parquet storage backend (planned wiring)
turbochroma[bench] — datasets + matplotlib for reproducing benchmarks

End-to-end example

Match SQ8Codec(dimension=...) to your embedder (e.g. 1024 for BGE-M3, 384 for many small models). Blobs are written under the default metadata key DefaultBlobKey ("tc_sq8_v1").

import numpy as np
import chromadb
from chromadb.config import Settings
from turbochroma import QuantizedCollection, SQ8Codec, DefaultBlobKey

# Same dimension as your embedding model
DIM = 1024
SEED = 42

# 1) Chroma as usual
client = chromadb.PersistentClient(path="./chroma_data")
collection = client.get_or_create_collection(
    "my_docs",
    metadata={"hnsw:space": "cosine"},
)

# 2) Codec + wrapper
codec = SQ8Codec(dimension=DIM, seed=SEED)
qc = QuantizedCollection(
    collection,
    codec,
    refine_factor=4,
)

def norm_rows(x: np.ndarray) -> np.ndarray:
    x = x.astype(np.float32)
    n = np.linalg.norm(x, axis=1, keepdims=True)
    n = np.where(n == 0, 1.0, n)
    return x / n

# 3) Ingest: replace with outputs from your embedder
embeddings = norm_rows(np.random.randn(50, DIM))
qc.add(
    ids=[f"chunk_{i}" for i in range(50)],
    embeddings=embeddings.tolist(),
    metadatas=[{"source": f"doc_{i // 10}"} for i in range(50)],
)

# 4) Optional: confirm the blob in metadata
row = collection.get(ids=["chunk_0"], include=["metadatas"])
assert DefaultBlobKey in (row["metadatas"][0] or {})

# 5) Query with optional ADC re-rank (use your real query embedding)
q = norm_rows(np.random.randn(1, DIM))[0].tolist()
results = qc.query(
    query_embeddings=[q],
    n_results=8,
    include=["metadatas", "distances", "documents"],
    refine_factor=4,
)
print("Top ids:", results["ids"][0][:3])

# 6) Vectors already in Chroma but added without turbochroma? Backfill:
# n = qc.fit_existing()
# print("metadata rows updated:", n)

Experimenting / seeing the effect

get(..., include=["metadatas"]): check for the key tc_sq8_v1 and the base64 value (one logical int8 per dimension, base64 in JSON).
Compare refine_factor=1 vs 4 on the same query_embeddings and watch whether ids[0] order changes (larger effect when more than two documents compete and Chroma’s first stage is imperfect for your metric).
Codec-only sanity check (no Chroma): from the repo root, run python benchmarks/synthetic_mae.py for MAE, compression ratio, and timing.

30-second quickstart (minimal)

import chromadb
from turbochroma import QuantizedCollection, SQ8Codec

DIM = 1024
client = chromadb.PersistentClient(path="./chroma")
coll = client.get_or_create_collection("docs")
qc = QuantizedCollection(coll, SQ8Codec(dimension=DIM, seed=42), refine_factor=4)

# qc.add(... embeddings from your model ...)
# q_vec = your_query_embedding  # list[float] length DIM
# qc.query(query_embeddings=[q_vec], n_results=10, include=["metadatas", "distances"])

If you only have existing float vectors: QuantizedCollection(...).fit_existing().

How it works

Sparse rotation — every embedding is multiplied by a fixed ±1 sign pattern and permuted. This spreads distribution outliers across dimensions so scalar quantization loses less information.
SQ8 quantization — each rotated float32 dimension is scaled and clipped to int8 (4× compression).
Asymmetric distance computation (ADC) — at query time the query stays in float32, the document is decompressed on the fly, and the dot product is computed directly. You pay float32 precision only for the query, which is already cheap.

More detail: docs/design/001-why-turbochroma.md.

Benchmarks

Synthetic MAE and compression: python benchmarks/synthetic_mae.py from a clone.

BEIR-style tables: planned for v0.1.0; see the roadmap.

Metric	Chroma vanilla (float32)	turbochroma SQ8	FAISS SQ8 (baseline)
Recall@10	TBD	TBD	TBD
MRR@10	TBD	TBD	TBD
RAM peak	TBD	TBD	TBD
p50 query latency	TBD	TBD	TBD

Roadmap

v0.1.0 — SQ8 codec + sparse rotation + QuantizedCollection + two storage backends (metadata-blob, sidecar parquet) + BEIR benchmarks.
v0.2.0 — Product Quantization (PQ) codec.
v0.3.0 — 1-bit / RaBitQ-style codec (32× compression).
v0.4.0 — Learned rotation (OPQ-style) trained on your corpus.

Credits

Originally incubated inside Minervia, a Spanish-language legal-RAG system. See CREDITS.md for full lineage.

Created and maintained by Angel Israel Moreno Castellanos.

License

MIT — see LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.9

Apr 27, 2026

0.1.5

Apr 27, 2026

This version

0.1.4

Apr 24, 2026

0.1.3

Apr 24, 2026

0.1.2

Apr 24, 2026

0.1.1

Apr 24, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

turbochroma-0.1.4.tar.gz (34.8 kB view details)

Uploaded Apr 24, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

turbochroma-0.1.4-py3-none-any.whl (24.4 kB view details)

Uploaded Apr 24, 2026 Python 3

File details

Details for the file turbochroma-0.1.4.tar.gz.

File metadata

Download URL: turbochroma-0.1.4.tar.gz
Upload date: Apr 24, 2026
Size: 34.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for turbochroma-0.1.4.tar.gz
Algorithm	Hash digest
SHA256	`381bc9b3b217416a3434f53fa26a262481f015d47690a62d85fd25db5b180c8b`
MD5	`5cd2ef0e380d6128ebd61f72fd44ee50`
BLAKE2b-256	`0bf155df88b18096ec94788a186ff9f4dbad0244d6802acce4586628e177f484`

See more details on using hashes here.

File details

Details for the file turbochroma-0.1.4-py3-none-any.whl.

File metadata

Download URL: turbochroma-0.1.4-py3-none-any.whl
Upload date: Apr 24, 2026
Size: 24.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for turbochroma-0.1.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`287c2d09b9fd9c228c4231ec625b84916222ef410865cfcb1f9409c87c710724`
MD5	`dd96e79c560afbba1d4940db481decd6`
BLAKE2b-256	`393ad4409c99c57adc3fb94df8ea2aa75dae6d6227a26ac8c9e92142e230cd38`

See more details on using hashes here.

turbochroma 0.1.4

Navigation

Verified details

Project links

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

turbochroma

Why turbochroma for RAG?

RAG & LLM Integration Patterns

Limitations (read before you ship)

Installation

End-to-end example

30-second quickstart (minimal)

How it works

Benchmarks

Roadmap

Credits

License

Project details

Verified details

Project links

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes