Embedded vector store for local-first AI applications.
Project description
Python Binding
The first language target now has a concrete package layout:
- Rust extension crate in
bindings/python/src/lib.rs - Python package in
bindings/python/vectlite - packaging via
maturin
Local Development
cd bindings/python
maturin develop
pytest
TestPyPI Release
From the repo root:
./scripts/publish_testpypi.sh
Then upload with a TestPyPI token:
export TEST_PYPI_API_TOKEN="pypi-..."
UPLOAD=1 ./scripts/publish_testpypi.sh
The full flow is documented in docs/testpypi-release.md in the repository.
PyPI Release
From the repo root:
./scripts/publish_pypi.sh
Then upload with a PyPI token:
export PYPI_API_TOKEN="pypi-..."
UPLOAD=1 ./scripts/publish_pypi.sh
The full flow is documented in docs/pypi-release.md in the repository.
API
import vectlite
db = vectlite.open("knowledge.vdb", dimension=384)
with db.transaction() as tx:
tx.upsert(
"doc1",
embedding,
{"source": "notes", "priority": 10, "title": "Auth setup"},
namespace="notes",
sparse={"auth": 1.0, "sso": 0.5},
vectors={"title": title_embedding, "body": body_embedding},
)
tx.upsert_many(
[
{
"id": "doc2",
"vector": other_embedding,
"sparse": {"auth": 0.7},
"metadata": {"source": "notes", "text": "billing and auth notes"},
}
],
namespace="notes",
)
record = db.get("doc1")
results = db.search(
query,
k=5,
filter={"source": {"$ne": "blog"}, "priority": {"$gte": 5, "$lte": 20}},
namespace="notes",
sparse={"auth": 1.0},
vector_name="title",
dense_weight=1.0,
sparse_weight=1.0,
fusion="rrf",
rrf_k=30,
fetch_k=20,
mmr_lambda=0.3,
explain=True,
rerank=vectlite.rerankers.compose(
vectlite.rerankers.text_match(),
vectlite.rerankers.metadata_boost("source", {"notes": 0.2}),
),
)
debug = db.search_with_stats(
query,
k=5,
namespace="notes",
sparse={"auth": 1.0},
vector_name="title",
fusion="rrf",
fetch_k=20,
mmr_lambda=0.3,
)
db.compact()
print(db.wal_path)
Supported metadata/filter value types are:
strintfloatbool
Supported filter operators in the MVP are:
- equality with
{"field": "value"} {"field": {"$eq": "value"}}{"field": {"$contains": "auth"}}{"field": {"$gt": 5}}{"field": {"$gte": 5}}{"field": {"$lt": 20}}{"field": {"$lte": 20}}{"field": {"$ne": "value"}}{"field": {"$in": ["a", "b"]}}{"field": {"$nin": ["a", "b"]}}{"field": {"$exists": True}}{"$and": [...]}{"$or": [...]}{"$not": {...}}
Batch helpers available on Database:
insert_many(records)upsert_many(records)delete_many(ids)
Durability helpers available on Database:
transaction()for atomic batched writeswal_pathto inspect the write-ahead log pathcompact()/flush()to checkpoint the snapshot and clear the WAL
Dense vector helpers available on Database:
upsert(..., vectors={"title": [...], "body": [...]})search(..., vector_name="title")get(id)["vectors"]
Namespace helpers available on Database:
- every CRUD/search method accepts
namespace=... search(..., all_namespaces=True)namespaces()
Text helpers available at package level:
vectlite.upsert_text(db, id, text, embed, ...)vectlite.search_text(db, query, embed, ...)vectlite.search_text_with_stats(db, query, embed, ...)vectlite.sparse_terms(text)
Retrieval Quality
- sparse retrieval uses a real inverted index with BM25-style scoring
fetch_kcontrols how many candidates are gathered before truncationmmr_lambdaenables MMR diversification for dense, sparse, or hybrid searchfusion="linear"andfusion="rrf"control dense+sparse score fusionrerank(query, results)can reorder the top candidates from Pythonrerank_klimits how many initial candidates are sent into the rerank hookvectlite.rerankers.text_match()boosts metadata text/title overlapvectlite.rerankers.metadata_boost(field, boosts)boosts metadata valuesvectlite.rerankers.compose(...)chains rerankers sequentially or with RRFdb.search_with_stats(...)returns both results and search diagnosticsexplain=Trueadds per-result debug payloads with ranks, matched terms, and rerank traces- lower
mmr_lambdafavors diversity more aggressively; higher values stay closer to raw relevance
ANN Behavior
- dense and hybrid search use HNSW indexes when enough points are present
- named vector spaces get their own dense ANN indexes
- ANN sidecars are persisted on disk and reloaded on open when the manifest still matches the record set
- sparse-only search remains exact but uses the inverted index instead of a full scan
- small collections stay on exact dense search to avoid ANN overhead and low-cardinality edge cases
- writes land in a crash-safe WAL first, then
compact()checkpoints back into the.vdbsnapshot - the
.vdbsnapshot plus.walare the source of truth; ANN sidecars are acceleration artifacts
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vectlite-0.1.1.tar.gz.
File metadata
- Download URL: vectlite-0.1.1.tar.gz
- Upload date:
- Size: 50.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9acccf1d75b517a91ae5844ffb7afefda3113eee3aa3971a6c3ac97fcfb9249d
|
|
| MD5 |
1d683bdaa6677b2e1508f3a08169b962
|
|
| BLAKE2b-256 |
8d19947bb774d14b60cc1f8e87b430be95298c1f6d2bea504d73c96179c93a37
|
File details
Details for the file vectlite-0.1.1-cp39-abi3-macosx_10_12_x86_64.whl.
File metadata
- Download URL: vectlite-0.1.1-cp39-abi3-macosx_10_12_x86_64.whl
- Upload date:
- Size: 1.7 MB
- Tags: CPython 3.9+, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
35ed514622db76aa9ac0deab41326a00efbd65dc31123182c087f97266f75324
|
|
| MD5 |
bbf0d0daf451bb43d597ed396f701424
|
|
| BLAKE2b-256 |
9cb3dd046d8e0b694d1caa6ed700dd4ca303a5f1ed0c8ba590430d3cddad3b11
|