Embedded vector store for local-first AI applications.
Project description
vectlite
Embedded vector store for local-first AI applications.
vectlite is a single-file, zero-dependency vector database written in Rust with Python bindings. It gives you dense + sparse hybrid search, HNSW indexing, metadata filtering, transactions, and crash-safe persistence in a single .vdb file -- no server, no Docker, no network calls.
Installation
pip install vectlite
Requires Python 3.9+. Pre-built wheels are available for macOS (x86_64, arm64), Linux (x86_64, aarch64), and Windows (x86_64).
Quick Start
import vectlite
# Create or open a database
db = vectlite.open("knowledge.vdb", dimension=384)
# Insert records with vectors, metadata, and sparse terms
db.upsert("doc1", embedding, {"source": "blog", "title": "Auth Guide"})
db.upsert("doc2", embedding2, {"source": "notes", "title": "Billing"})
# Search with filters
results = db.search(embedding_query, k=5, filter={"source": "blog"})
# Clean up
db.compact()
Features
Core
- Single-file storage -- one
.vdbfile per database, portable and easy to back up - Dense vectors -- cosine similarity with automatic HNSW indexing for large collections
- Sparse vectors -- BM25-scored inverted index for keyword retrieval
- Hybrid search -- dense + sparse fusion with linear or RRF strategies
- Rich metadata --
str,int,float,bool,None,list,dictvalues - Crash-safe WAL -- writes land in a write-ahead log first, then checkpoint with
compact() - Transactions -- atomic batched writes with
db.transaction() - File locking -- advisory locks prevent corruption from concurrent access
Search & Retrieval
- Metadata filters -- MongoDB-style operators:
$eq,$ne,$gt,$gte,$lt,$lte,$in,$nin,$contains,$exists,$and,$or,$not - Nested filters -- dot-path traversal (
author.name),$elemMatch,$sizeon lists and dicts - Named vectors -- multiple vector spaces per record (
vectors={"title": [...], "body": [...]}) - Multi-vector queries -- weighted search across vector spaces in a single call
- MMR diversification --
mmr_lambdacontrols relevance vs. diversity trade-off - Namespaces -- logical isolation with per-namespace or cross-namespace search
- Rerankers -- built-in
text_match(),metadata_boost(),cross_encoder(),bi_encoder(), composable withcompose() - Observability --
search_with_stats()returns timings, BM25 term scores, ANN stats, and per-resultexplainpayloads
Data Management
- Physical collections --
vectlite.open_store()manages a directory of independent databases - Bulk ingestion --
bulk_ingest()with deferred index rebuilds for fast imports - Snapshots --
db.snapshot(path)creates a self-contained copy - Backup / Restore --
db.backup(dir)andvectlite.restore(dir, path)for full roundtrips - Read-only mode --
vectlite.open(path, read_only=True)for safe concurrent readers - Text analyzers -- configurable tokenizer pipeline with stopwords, stemming, and n-grams
Usage
Hybrid Search with Reranking
import vectlite
db = vectlite.open("knowledge.vdb", dimension=384)
# Upsert with dense + sparse vectors
db.upsert(
"doc1",
dense_embedding,
{"source": "docs", "title": "Auth Setup", "text": "How to configure SSO..."},
sparse=vectlite.sparse_terms("How to configure SSO authentication"),
)
# Hybrid search with reranking
results = db.search(
query_embedding,
k=10,
sparse=vectlite.sparse_terms("SSO authentication"),
fusion="rrf",
filter={"source": "docs"},
explain=True,
rerank=vectlite.rerankers.compose(
vectlite.rerankers.text_match(),
vectlite.rerankers.metadata_boost("source", {"docs": 0.5}),
),
)
for result in results:
print(result["id"], result["score"])
Collections
store = vectlite.open_store("./my_collections")
products = store.create_collection("products", dimension=384)
products.upsert("p1", embedding, {"name": "Widget", "price": 9.99})
logs = store.open_or_create_collection("logs", dimension=128)
print(store.collections()) # ["logs", "products"]
Transactions
with db.transaction() as tx:
tx.upsert("doc1", emb1, {"source": "a"})
tx.upsert("doc2", emb2, {"source": "b"})
tx.delete("old_doc")
# All operations commit atomically or roll back on exception
Text Helpers
# Handles embedding + sparse term generation for you
vectlite.upsert_text(db, "doc1", "Auth setup guide", embed_fn, {"source": "docs"})
results = vectlite.search_text(db, "how to authenticate", embed_fn, k=5)
Analyzers
analyzer = vectlite.analyzers.Analyzer().lowercase().stopwords("en").stemmer("english")
terms = analyzer.sparse_terms("How to authenticate users with SSO")
# Use with upsert: db.upsert("doc1", emb, meta, sparse=terms)
Snapshots & Backup
db.snapshot("/backups/knowledge_2024.vdb") # Self-contained copy
db.backup("/backups/full/") # Full backup with ANN sidecars
restored = vectlite.restore("/backups/full/", "restored.vdb")
Read-Only Mode
ro = vectlite.open("knowledge.vdb", read_only=True)
results = ro.search(query, k=5) # Reads work
ro.upsert(...) # Raises VectLiteError
Search Diagnostics
outcome = db.search_with_stats(query, k=5, sparse=terms, explain=True)
print(outcome["stats"]["timings"]) # {"dense_us": 120, "sparse_us": 45, ...}
print(outcome["stats"]["used_ann"]) # True
print(outcome["results"][0]["explain"]) # Detailed scoring breakdown
Filter Operators
| Operator | Example | Description |
|---|---|---|
$eq |
{"field": {"$eq": "value"}} |
Equal (also {"field": "value"}) |
$ne |
{"field": {"$ne": "value"}} |
Not equal |
$gt / $gte |
{"field": {"$gt": 5}} |
Greater than (or equal) |
$lt / $lte |
{"field": {"$lt": 20}} |
Less than (or equal) |
$in / $nin |
{"field": {"$in": ["a", "b"]}} |
In / not in set |
$contains |
{"field": {"$contains": "auth"}} |
Substring match |
$exists |
{"field": {"$exists": True}} |
Field presence |
$and / $or |
{"$and": [{...}, {...}]} |
Logical combinators |
$not |
{"$not": {...}} |
Logical negation |
$elemMatch |
{"tags": {"$elemMatch": {"$eq": "rust"}}} |
Match list elements |
$size |
{"tags": {"$size": 3}} |
List length |
| dot-path | {"author.name": "Alice"} |
Nested field access |
How It Works
- Records are stored in a compact binary
.vdbsnapshot file - Writes go through a crash-safe WAL (
.wal) before being applied in memory compact()folds the WAL into the snapshot and persists HNSW sidecar files- Dense search uses HNSW indexes (auto-built for collections above ~128 records)
- Sparse search uses an inverted index with BM25 scoring
- Hybrid fusion combines dense + sparse via linear combination or reciprocal rank fusion
- Advisory file locks (
flock) prevent concurrent write corruption
Links
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vectlite-0.1.3.tar.gz.
File metadata
- Download URL: vectlite-0.1.3.tar.gz
- Upload date:
- Size: 53.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b80506d4a29d6a0dad52a7c99e9abef1b8e007f24b18c054a7c372cf5dabbea6
|
|
| MD5 |
ea8c43335bd4ee239ad978e67938ca0a
|
|
| BLAKE2b-256 |
85316eee999ef8748e2151fe71b138b08533923d285faebaa9f2aa5567191f85
|
File details
Details for the file vectlite-0.1.3-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: vectlite-0.1.3-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 1.7 MB
- Tags: PyPy, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8bc1bd3953358326abff2e8a25525c09a00af5108bf40974a636296afb743434
|
|
| MD5 |
5d6cb66b43e9cfae4254d7df78e60eb6
|
|
| BLAKE2b-256 |
2685292fed9f372ef561bbd8651465d546a3a2949a562594c0ee1ca22b3163bd
|
File details
Details for the file vectlite-0.1.3-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: vectlite-0.1.3-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 1.8 MB
- Tags: PyPy, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
29d119c93f8104c0bfbe1e5b8b305a72b5b8e5ce8a113e49020d757ccf0aa90c
|
|
| MD5 |
a5a16bda03f691e2f2a4f8491be9d5bd
|
|
| BLAKE2b-256 |
44f5827ffcbdd41be1db6f3fc71482e69d86a5f22de52535673c4eb564bc5fe0
|
File details
Details for the file vectlite-0.1.3-cp39-abi3-win_amd64.whl.
File metadata
- Download URL: vectlite-0.1.3-cp39-abi3-win_amd64.whl
- Upload date:
- Size: 1.5 MB
- Tags: CPython 3.9+, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a40eaad25bc1b42651b6e14b4a141cb3fa34a932196e6cac46d42ae3a9372903
|
|
| MD5 |
fac1ec2320bb6d7e1311b13179a9d667
|
|
| BLAKE2b-256 |
3a11c737129f83da94c48c5901fa86703d78c9b649613e403356facc217ae75a
|
File details
Details for the file vectlite-0.1.3-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: vectlite-0.1.3-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 1.8 MB
- Tags: CPython 3.9+, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
832cfcf417efe865af1f6bc08c830377f313eddcdbf29e9c343d22f8fde589c4
|
|
| MD5 |
a97d11785b1c1c8feea392a07ff17b67
|
|
| BLAKE2b-256 |
b59fcf763149c168acaa7be36f3c0b02c2546e9ab12bee230b211f8352f28cf6
|
File details
Details for the file vectlite-0.1.3-cp39-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: vectlite-0.1.3-cp39-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 1.6 MB
- Tags: CPython 3.9+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
401ec08b8a33340de77839487196b07cea8efbdf6db3242800f673002e3915db
|
|
| MD5 |
be89103ffd9d99b1c4e4aa0848fb5ec8
|
|
| BLAKE2b-256 |
90ec1617d726a129c279a92e4c495ea621517012fab44f1422a8c42664ed29af
|
File details
Details for the file vectlite-0.1.3-cp39-abi3-macosx_10_12_x86_64.whl.
File metadata
- Download URL: vectlite-0.1.3-cp39-abi3-macosx_10_12_x86_64.whl
- Upload date:
- Size: 1.7 MB
- Tags: CPython 3.9+, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9f92dc922b10b2a0ca8c97bd842ffb2e004ac799b37af0991cc88d7ee63afd3c
|
|
| MD5 |
e0662cd92f14a54c1c4460a5722c2303
|
|
| BLAKE2b-256 |
a6a90c8532240395afb92b29cbd1db499d47910bdf77b3e2faf4ce4789c07e74
|
File details
Details for the file vectlite-0.1.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: vectlite-0.1.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 1.8 MB
- Tags: CPython 3.8, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6f113ededeb4ba5d0a10a4cc16b70667eb765b0a1152586d2fae65eb9e3a58df
|
|
| MD5 |
de93f30c3b08bf32a7afcd92f7827ef4
|
|
| BLAKE2b-256 |
00db160c92837e9a914265e6bdbaacc9303584a65761ad58346c701856c247ba
|