An embedded vector database: turbovec 4-bit ANN + a durable SQLite store, with metadata filters, persistence, exact-cosine re-rank, and multi-process safety.
Project description
turbovecdb
An embedded vector database. turbovec's 4-bit TurboQuant ANN for fast approximate search, paired with a durable SQLite sidecar that holds the documents, metadata, id map, and the exact float32 vectors. Metadata filters, persistence, exact-cosine re-rank, and multi-process safety are built in.
It's the kind of thing you reach for when you want a local, CPU-resident vector store with a small footprint and no server to run — and you're happy to bring your own embedding model (or hand it one).
Why
- SQLite is the source of truth. Every vector is stored exactly (float32) and
durably. The turbovec
.tvimindex is a rebuildable cache — if it's missing or stale it's rebuilt from SQLite on open, so a crash never loses data. - Exact answers from an approximate index. turbovec finds a candidate pool
fast; turbovecdb re-ranks it with true cosine, so callers get a correct
distance ∈ [0, 2], not turbovec's raw quantized score. - Multi-process safe. Writes take a cross-process file lock; readers detect another process's writes and reload. Run a writer and readers against the same database directory.
- Small and quick. In a local benchmark (15.8k docs, 384-d), turbovecdb built ~12× faster, queried ~3× faster (p50/p95), and used ~2.3× less disk than an HNSW-based store — at indistinguishable retrieval quality.
Install
pip install turbovecdb # pulls turbovec, numpy, filelock
Requires Python ≥ 3.9. The vector dimension must be a positive multiple of 8 (e.g. 384, 768) — a turbovec requirement.
Usage
import turbovecdb
db = turbovecdb.connect("/path/to/db")
# Bring your own vectors:
col = db.collection("docs", create=True)
col.add(
ids=["a", "b"],
documents=["the quick brown fox", "lorem ipsum"],
metadatas=[{"lang": "en"}, {"lang": "la"}],
vectors=[[...384 floats...], [...384 floats...]],
)
hits = col.query(vector=[...384 floats...], k=5, where={"lang": "en"})
print(hits.ids, hits.distances, hits.documents)
# ...or hand it an embedder (list[str] -> list[list[float]]):
col = db.collection("docs2", embedder=my_embed_fn, create=True)
col.add(ids=["a"], documents=["hello world"])
hits = col.query(text="a greeting", k=5)
Filters
where supports $eq (bare scalar too), $ne, $in, $nin, $gt, $gte,
$lt, $lte, and $and / $or (recursive). where_document supports
$contains. Unsupported operators raise UnsupportedFilterError — filters
never silently fail.
col.query(vector=v, k=10, where={"$and": [{"lang": "en"}, {"year": {"$gte": 2020}}]})
col.get(where={"lang": {"$in": ["en", "fr"]}}, where_document={"$contains": "fox"})
API
connect(path) -> Database · Database.collection(name, *, dim=None, bit_width=4, metric="cosine", embedder=None, create=True) -> Collection
Collection: add / upsert (ids, documents/vectors, metadatas),
query (text|vector, k, where, where_document, include), get,
delete, count, flush, close. Results are QueryResult / GetResult
dataclasses with flat lists.
Documentation
- docs/core/architecture.md — how it's built (two-tier store, read/write paths, exact re-rank)
- docs/core/data-model.md — on-disk layout, SQLite
schema, the
.tvimcache, generation counters - docs/core/concurrency.md — the multi-process model
- docs/performance/ — benchmark harness + measured results
License
MIT.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file turbovecdb-0.1.0.tar.gz.
File metadata
- Download URL: turbovecdb-0.1.0.tar.gz
- Upload date:
- Size: 24.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
07ea5f839fea17ed96742da10922287b047713265f6c936ae928ad3081823841
|
|
| MD5 |
0da20a9aff78a60d8cad3645a084df3a
|
|
| BLAKE2b-256 |
cb096f7b70c65bff467cdd9dffa1560d7e435ba9c02aca4ae04f915142dce209
|
File details
Details for the file turbovecdb-0.1.0-py3-none-any.whl.
File metadata
- Download URL: turbovecdb-0.1.0-py3-none-any.whl
- Upload date:
- Size: 13.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
013a831c47a829192b6d4a6e55a3a056bcc7225fba5bc4855a85c8e46be52c50
|
|
| MD5 |
c6bcb6139c917e13c0a62b5db255f6c3
|
|
| BLAKE2b-256 |
b27703c0ef770bc79c05a1c58fffe9adcf50428fb1dafb643353e2c107e19977
|