Skip to main content

An embedded vector database: turbovec 4-bit ANN + a durable SQLite store, with metadata filters, persistence, exact-cosine re-rank, and multi-process safety.

Project description

turbovecdb

An embedded vector database. turbovec's 4-bit TurboQuant ANN for fast approximate search, paired with a durable SQLite sidecar that holds the documents, metadata, id map, and the exact float32 vectors. Metadata filters, persistence, exact-cosine re-rank, and multi-process safety are built in.

It's the kind of thing you reach for when you want a local, CPU-resident vector store with a small footprint and no server to run — and you're happy to bring your own embedding model (or hand it one).

Why

  • SQLite is the source of truth. Every vector is stored exactly (float32) and durably. The turbovec .tvim index is a rebuildable cache — if it's missing or stale it's rebuilt from SQLite on open, so a crash never loses data.
  • Exact answers from an approximate index. turbovec finds a candidate pool fast; turbovecdb re-ranks it with true cosine, so callers get a correct distance ∈ [0, 2], not turbovec's raw quantized score.
  • Multi-process safe. Writes take a cross-process file lock; readers detect another process's writes and reload. Run a writer and readers against the same database directory.
  • Small and quick. In a local benchmark (15.8k docs, 384-d), turbovecdb built ~12× faster, queried ~3× faster (p50/p95), and used ~2.3× less disk than an HNSW-based store — at indistinguishable retrieval quality.

Install

pip install turbovecdb        # pulls turbovec, numpy, filelock

Requires Python ≥ 3.9. The vector dimension must be a positive multiple of 8 (e.g. 384, 768) — a turbovec requirement.

Usage

import turbovecdb

db = turbovecdb.connect("/path/to/db")

# Bring your own vectors:
col = db.collection("docs", create=True)
col.add(
    ids=["a", "b"],
    documents=["the quick brown fox", "lorem ipsum"],
    metadatas=[{"lang": "en"}, {"lang": "la"}],
    vectors=[[...384 floats...], [...384 floats...]],
)
hits = col.query(vector=[...384 floats...], k=5, where={"lang": "en"})
print(hits.ids, hits.distances, hits.documents)

# ...or hand it an embedder (list[str] -> list[list[float]]):
col = db.collection("docs2", embedder=my_embed_fn, create=True)
col.add(ids=["a"], documents=["hello world"])
hits = col.query(text="a greeting", k=5)

Filters

where supports $eq (bare scalar too), $ne, $in, $nin, $gt, $gte, $lt, $lte, and $and / $or (recursive). where_document supports $contains. Unsupported operators raise UnsupportedFilterError — filters never silently fail.

col.query(vector=v, k=10, where={"$and": [{"lang": "en"}, {"year": {"$gte": 2020}}]})
col.get(where={"lang": {"$in": ["en", "fr"]}}, where_document={"$contains": "fox"})

API

connect(path) -> Database · Database.collection(name, *, dim=None, bit_width=4, metric="cosine", embedder=None, create=True) -> Collection

Collection: add / upsert (ids, documents/vectors, metadatas), query (text|vector, k, where, where_document, include), get, delete, count, flush, close. Results are QueryResult / GetResult dataclasses with flat lists.

Documentation

License

MIT.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

turbovecdb-0.1.0.tar.gz (24.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

turbovecdb-0.1.0-py3-none-any.whl (13.8 kB view details)

Uploaded Python 3

File details

Details for the file turbovecdb-0.1.0.tar.gz.

File metadata

  • Download URL: turbovecdb-0.1.0.tar.gz
  • Upload date:
  • Size: 24.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for turbovecdb-0.1.0.tar.gz
Algorithm Hash digest
SHA256 07ea5f839fea17ed96742da10922287b047713265f6c936ae928ad3081823841
MD5 0da20a9aff78a60d8cad3645a084df3a
BLAKE2b-256 cb096f7b70c65bff467cdd9dffa1560d7e435ba9c02aca4ae04f915142dce209

See more details on using hashes here.

File details

Details for the file turbovecdb-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: turbovecdb-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 13.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for turbovecdb-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 013a831c47a829192b6d4a6e55a3a056bcc7225fba5bc4855a85c8e46be52c50
MD5 c6bcb6139c917e13c0a62b5db255f6c3
BLAKE2b-256 b27703c0ef770bc79c05a1c58fffe9adcf50428fb1dafb643353e2c107e19977

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page