Skip to main content

Embedded vector store for local-first AI applications.

Project description

vectlite

PyPI version Python versions License: MIT

Embedded vector store for local-first AI applications.

vectlite is a single-file, zero-dependency vector database written in Rust with Python bindings. It gives you dense + sparse hybrid search, HNSW indexing, metadata filtering, transactions, and crash-safe persistence in a single .vdb file -- no server, no Docker, no network calls.

Installation

pip install vectlite

Requires Python 3.9+. Pre-built wheels are available for macOS (x86_64, arm64), Linux (x86_64, aarch64), and Windows (x86_64).

Quick Start

import vectlite

# Create or open a database
db = vectlite.open("knowledge.vdb", dimension=384)

# Insert records with vectors, metadata, and sparse terms
db.upsert("doc1", embedding, {"source": "blog", "title": "Auth Guide"})
db.upsert("doc2", embedding2, {"source": "notes", "title": "Billing"})

# Search with filters
results = db.search(embedding_query, k=5, filter={"source": "blog"})

# Clean up
db.compact()

Features

Core

  • Single-file storage -- one .vdb file per database, portable and easy to back up
  • Dense vectors -- cosine similarity with automatic HNSW indexing for large collections
  • Sparse vectors -- BM25-scored inverted index for keyword retrieval
  • Hybrid search -- dense + sparse fusion with linear or RRF strategies
  • Rich metadata -- str, int, float, bool, None, list, dict values
  • Crash-safe WAL -- writes land in a write-ahead log first, then checkpoint with compact()
  • Transactions -- atomic batched writes with db.transaction()
  • File locking -- advisory locks prevent corruption from concurrent access

Search & Retrieval

  • Metadata filters -- MongoDB-style operators: $eq, $ne, $gt, $gte, $lt, $lte, $in, $nin, $contains, $exists, $and, $or, $not
  • Nested filters -- dot-path traversal (author.name), $elemMatch, $size on lists and dicts
  • Named vectors -- multiple vector spaces per record (vectors={"title": [...], "body": [...]})
  • Multi-vector queries -- weighted search across vector spaces in a single call
  • MMR diversification -- mmr_lambda controls relevance vs. diversity trade-off
  • Namespaces -- logical isolation with per-namespace or cross-namespace search
  • Rerankers -- built-in text_match(), metadata_boost(), cross_encoder(), bi_encoder(), composable with compose()
  • Observability -- search_with_stats() returns timings, BM25 term scores, ANN stats, and per-result explain payloads

Data Management

  • Physical collections -- vectlite.open_store() manages a directory of independent databases
  • Bulk ingestion -- bulk_ingest() with deferred index rebuilds for fast imports
  • Snapshots -- db.snapshot(path) creates a self-contained copy
  • Backup / Restore -- db.backup(dir) and vectlite.restore(dir, path) for full roundtrips
  • Read-only mode -- vectlite.open(path, read_only=True) for safe concurrent readers
  • Text analyzers -- configurable tokenizer pipeline with stopwords, stemming, and n-grams

Usage

Hybrid Search with Reranking

import vectlite

db = vectlite.open("knowledge.vdb", dimension=384)

# Upsert with dense + sparse vectors
db.upsert(
    "doc1",
    dense_embedding,
    {"source": "docs", "title": "Auth Setup", "text": "How to configure SSO..."},
    sparse=vectlite.sparse_terms("How to configure SSO authentication"),
)

# Hybrid search with reranking
results = db.search(
    query_embedding,
    k=10,
    sparse=vectlite.sparse_terms("SSO authentication"),
    fusion="rrf",
    filter={"source": "docs"},
    explain=True,
    rerank=vectlite.rerankers.compose(
        vectlite.rerankers.text_match(),
        vectlite.rerankers.metadata_boost("source", {"docs": 0.5}),
    ),
)

for result in results:
    print(result["id"], result["score"])

Collections

store = vectlite.open_store("./my_collections")
products = store.create_collection("products", dimension=384)
products.upsert("p1", embedding, {"name": "Widget", "price": 9.99})

logs = store.open_or_create_collection("logs", dimension=128)
print(store.collections())  # ["logs", "products"]

Transactions

with db.transaction() as tx:
    tx.upsert("doc1", emb1, {"source": "a"})
    tx.upsert("doc2", emb2, {"source": "b"})
    tx.delete("old_doc")
# All operations commit atomically or roll back on exception

Text Helpers

# Handles embedding + sparse term generation for you
vectlite.upsert_text(db, "doc1", "Auth setup guide", embed_fn, {"source": "docs"})
results = vectlite.search_text(db, "how to authenticate", embed_fn, k=5)

Analyzers

analyzer = vectlite.analyzers.Analyzer().lowercase().stopwords("en").stemmer("english")
terms = analyzer.sparse_terms("How to authenticate users with SSO")
# Use with upsert: db.upsert("doc1", emb, meta, sparse=terms)

Snapshots & Backup

db.snapshot("/backups/knowledge_2024.vdb")  # Self-contained copy
db.backup("/backups/full/")                 # Full backup with ANN sidecars

restored = vectlite.restore("/backups/full/", "restored.vdb")

Read-Only Mode

ro = vectlite.open("knowledge.vdb", read_only=True)
results = ro.search(query, k=5)  # Reads work
ro.upsert(...)                    # Raises VectLiteError

Search Diagnostics

outcome = db.search_with_stats(query, k=5, sparse=terms, explain=True)

print(outcome["stats"]["timings"])       # {"dense_us": 120, "sparse_us": 45, ...}
print(outcome["stats"]["used_ann"])      # True
print(outcome["results"][0]["explain"])  # Detailed scoring breakdown

Filter Operators

Operator Example Description
$eq {"field": {"$eq": "value"}} Equal (also {"field": "value"})
$ne {"field": {"$ne": "value"}} Not equal
$gt / $gte {"field": {"$gt": 5}} Greater than (or equal)
$lt / $lte {"field": {"$lt": 20}} Less than (or equal)
$in / $nin {"field": {"$in": ["a", "b"]}} In / not in set
$contains {"field": {"$contains": "auth"}} Substring match
$exists {"field": {"$exists": True}} Field presence
$and / $or {"$and": [{...}, {...}]} Logical combinators
$not {"$not": {...}} Logical negation
$elemMatch {"tags": {"$elemMatch": {"$eq": "rust"}}} Match list elements
$size {"tags": {"$size": 3}} List length
dot-path {"author.name": "Alice"} Nested field access

How It Works

  • Records are stored in a compact binary .vdb snapshot file
  • Writes go through a crash-safe WAL (.wal) before being applied in memory
  • compact() folds the WAL into the snapshot and persists HNSW sidecar files
  • Dense search uses HNSW indexes (auto-built for collections above ~128 records)
  • Sparse search uses an inverted index with BM25 scoring
  • Hybrid fusion combines dense + sparse via linear combination or reciprocal rank fusion
  • Advisory file locks (flock) prevent concurrent write corruption

Links

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vectlite-0.1.3.tar.gz (53.1 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

vectlite-0.1.3-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.7 MB view details)

Uploaded PyPymanylinux: glibc 2.17+ ARM64

vectlite-0.1.3-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.8 MB view details)

Uploaded PyPymanylinux: glibc 2.17+ ARM64

vectlite-0.1.3-cp39-abi3-win_amd64.whl (1.5 MB view details)

Uploaded CPython 3.9+Windows x86-64

vectlite-0.1.3-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.8 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ ARM64

vectlite-0.1.3-cp39-abi3-macosx_11_0_arm64.whl (1.6 MB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

vectlite-0.1.3-cp39-abi3-macosx_10_12_x86_64.whl (1.7 MB view details)

Uploaded CPython 3.9+macOS 10.12+ x86-64

vectlite-0.1.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.8 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

File details

Details for the file vectlite-0.1.3.tar.gz.

File metadata

  • Download URL: vectlite-0.1.3.tar.gz
  • Upload date:
  • Size: 53.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vectlite-0.1.3.tar.gz
Algorithm Hash digest
SHA256 b80506d4a29d6a0dad52a7c99e9abef1b8e007f24b18c054a7c372cf5dabbea6
MD5 ea8c43335bd4ee239ad978e67938ca0a
BLAKE2b-256 85316eee999ef8748e2151fe71b138b08533923d285faebaa9f2aa5567191f85

See more details on using hashes here.

File details

Details for the file vectlite-0.1.3-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for vectlite-0.1.3-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 8bc1bd3953358326abff2e8a25525c09a00af5108bf40974a636296afb743434
MD5 5d6cb66b43e9cfae4254d7df78e60eb6
BLAKE2b-256 2685292fed9f372ef561bbd8651465d546a3a2949a562594c0ee1ca22b3163bd

See more details on using hashes here.

File details

Details for the file vectlite-0.1.3-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for vectlite-0.1.3-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 29d119c93f8104c0bfbe1e5b8b305a72b5b8e5ce8a113e49020d757ccf0aa90c
MD5 a5a16bda03f691e2f2a4f8491be9d5bd
BLAKE2b-256 44f5827ffcbdd41be1db6f3fc71482e69d86a5f22de52535673c4eb564bc5fe0

See more details on using hashes here.

File details

Details for the file vectlite-0.1.3-cp39-abi3-win_amd64.whl.

File metadata

  • Download URL: vectlite-0.1.3-cp39-abi3-win_amd64.whl
  • Upload date:
  • Size: 1.5 MB
  • Tags: CPython 3.9+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vectlite-0.1.3-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 a40eaad25bc1b42651b6e14b4a141cb3fa34a932196e6cac46d42ae3a9372903
MD5 fac1ec2320bb6d7e1311b13179a9d667
BLAKE2b-256 3a11c737129f83da94c48c5901fa86703d78c9b649613e403356facc217ae75a

See more details on using hashes here.

File details

Details for the file vectlite-0.1.3-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for vectlite-0.1.3-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 832cfcf417efe865af1f6bc08c830377f313eddcdbf29e9c343d22f8fde589c4
MD5 a97d11785b1c1c8feea392a07ff17b67
BLAKE2b-256 b59fcf763149c168acaa7be36f3c0b02c2546e9ab12bee230b211f8352f28cf6

See more details on using hashes here.

File details

Details for the file vectlite-0.1.3-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for vectlite-0.1.3-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 401ec08b8a33340de77839487196b07cea8efbdf6db3242800f673002e3915db
MD5 be89103ffd9d99b1c4e4aa0848fb5ec8
BLAKE2b-256 90ec1617d726a129c279a92e4c495ea621517012fab44f1422a8c42664ed29af

See more details on using hashes here.

File details

Details for the file vectlite-0.1.3-cp39-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for vectlite-0.1.3-cp39-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 9f92dc922b10b2a0ca8c97bd842ffb2e004ac799b37af0991cc88d7ee63afd3c
MD5 e0662cd92f14a54c1c4460a5722c2303
BLAKE2b-256 a6a90c8532240395afb92b29cbd1db499d47910bdf77b3e2faf4ce4789c07e74

See more details on using hashes here.

File details

Details for the file vectlite-0.1.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for vectlite-0.1.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 6f113ededeb4ba5d0a10a4cc16b70667eb765b0a1152586d2fae65eb9e3a58df
MD5 de93f30c3b08bf32a7afcd92f7827ef4
BLAKE2b-256 00db160c92837e9a914265e6bdbaacc9303584a65761ad58346c701856c247ba

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page