Skip to main content

A facade over vector databases — one interface, ~15 backends

Project description

vd

A facade over vector databases — one Pythonic interface, ~15 backends.

vd lets you operate on any vector database and switch between them with a one-word change, while keeping each backend's particular power one escape hatch away. It also helps you choose the right backend and set it up.

import vd

client = vd.connect("memory")          # switch DB = change this one word
col = client.create_collection("docs")
col["a"] = vd.Document(id="a", text="cats", vector=[0.1, 0.9, 0.0])
col["b"] = vd.Document(id="b", text="pizza", vector=[0.9, 0.0, 0.1])

for hit in col.search([0.1, 0.8, 0.0], limit=2):
    print(hit["id"], hit["score"])

Install

pip install vd                 # core (zero heavy deps) + the memory backend
pip install vd[chroma]         # + a specific backend's client
pip install vd[embedded]       # + all embedded backends (chroma, qdrant, faiss, …)
pip install vd[all-backends]   # + every backend client

The core is near-zero-dependency. Each backend's client library is an optional extra named after the backend.

The mental model

vd stores and searches vectors. Turning text into vectors — embedding — is deliberately external: vd never embeds on its own. This keeps the facade honest (most vector DBs do not embed for you) and lightweight.

  • Vector-first. You hold the embedding model. Hand vd Documents that already carry a vector; search with a pre-computed query vector.
  • Text convenience. Pass an embedder (text -> vector) to connect, and then raw text works: col["k"] = "some text", col.search("a query").

With no embedder, passing text raises EmbeddingRequiredError — loud, never a silent wrong-model embedding.

client = vd.connect("chroma", persist_directory="./db", embedder=my_embed_fn)
col = client.create_collection("docs")
col["a"] = "cats and kittens"                  # embedded for you
hits = list(col.search("pets", limit=5))       # query embedded for you

Choosing a backend

vd ships a provider registry distilled from a practitioner report (misc/docs/11 -- VectorDB Selection & Setup Guide ...md) and a recommender:

vd.print_recommendation(
    corpus_size="medium", persistence=True, can_run_docker=True,
    cloud_ok=True, budget="free", needs_hybrid=False,
)
vd.print_backends_table()                       # the whole landscape
vd.compare_backends(["chroma", "qdrant", "pgvector"])

Setting a backend up

vd.check_requirements("qdrant")    # diagnoses readiness, prints the next step
vd.setup_guide("qdrant")           # full pip / docker / env-var playbook
vd.install_backend("qdrant")       # the pip command (run=True to install)

check_requirements is deployment-aware: it checks the pip package for embedded backends, whether a server answers for self-hosted ones, and the required environment variables for managed ones — always ending with one concrete next action.

The API

Object Is a Plus
Client (from connect) Mapping[str, Collection] create_collection, get_collection, delete_collection, get_or_create_collection
Collection MutableMapping[str, Document] search(...)
Document dataclass id, text, vector, metadata
col["k"] = vd.Document(id="k", text="…", vector=[...], metadata={"y": 2024})
doc      = col["k"]            # get
del col["k"]                  # delete
"k" in col, len(col), list(col)

col.search(query, *, limit=10, filter=None, egress=None, **backend_kwargs)

search yields dicts {"id", "text", "score", "metadata"} (score is higher-is-better). Transform results with an egress: vd.id_only, vd.id_and_score, vd.text_only, vd.id_text_score, or your own.

Metadata filtering

One backend-agnostic, MongoDB-style filter language — $eq $ne $gt $gte $lt $lte $in $nin $exists $and $or $not:

col.search(qvec, filter={"year": {"$gte": 2020}, "kind": {"$in": ["news", "blog"]}})

Each backend declares which operators it honors natively; an unsupported one raises UnsupportedFilterError rather than silently mis-filtering. Backends with rich native filtering (Qdrant, Pinecone, MongoDB) translate the filter; the rest apply it client-side with the same semantics.

Escape hatches

The facade never traps you. client.client is the raw backend client; collection.native is the raw backend collection — both supported, documented API for reaching backend-specific features.

Backends

Archetype Backends
Embedded (pip-only) memory, chroma, lancedb, sqlite_vec, duckdb, faiss
Server (also embedded) qdrant, weaviate, milvus
Server redis, elasticsearch, pgvector
Managed pinecone, mongodb (Atlas), turbopuffer

vd.list_backends() shows what is installed and ready now.

The toolkit

Beyond the facade, vd bundles the composite operations people actually do:

  • vd.searchmulti_query_search, reciprocal_rank_fusion, search_similar_to_document, deduplicate_results.
  • vd.ioexport_collection / import_collection (JSONL, JSON, directory).
  • vd.migrationmigrate_collection, migrate_client, copy_collection — move data between any two backends.
  • vd.analyticscollection_stats, find_duplicates, find_outliers, validate_collection.
  • vd.healthhealth_check_backend, benchmark_search.
  • vd.text — convenience text cleaning / chunking.
  • vd.TimeIndexedCollection — a time-windowed wrapper over any collection.
  • CLIvd backends, vd install, vd export/import, vd migrate, …

AI-agent skills

vd ships skills (vd/data/skills/) so coding agents can drive it well: vd-quickstart, vd-backend-choose (choosing and setup), vd-ingest, vd-search, vd-ops.

Design

  • Embedding is external. The core operates on vectors; an embedder is an injected, optional convenience — never a hard dependency.
  • Two mappings. A Client is a Mapping of collections; a Collection is a MutableMapping of documents plus search. Idiomatic, minimal, familiar.
  • Thin adapters. AbstractClient / AbstractCollection implement everything users see; a backend supplies a handful of raw primitives. Adding a backend is ~150 lines — see the vd-add-backend skill.
  • Capabilities, not a fat base. Optional features (SupportsBatch, SupportsHybrid) are @runtime_checkable protocols you feature-discover.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vd-0.2.6.tar.gz (207.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vd-0.2.6-py3-none-any.whl (148.1 kB view details)

Uploaded Python 3

File details

Details for the file vd-0.2.6.tar.gz.

File metadata

  • Download URL: vd-0.2.6.tar.gz
  • Upload date:
  • Size: 207.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.16 {"installer":{"name":"uv","version":"0.11.16","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for vd-0.2.6.tar.gz
Algorithm Hash digest
SHA256 f62fd90a19e099ad97ce0991cf62e67d82939eb4174284bd65b268c3b95df029
MD5 881ea5a026070421204526464998b133
BLAKE2b-256 4ae0bcc57b0037d733c501e9ac8c8d49c27e8e8b0ff2ce32e9ee05f8cb8963b5

See more details on using hashes here.

File details

Details for the file vd-0.2.6-py3-none-any.whl.

File metadata

  • Download URL: vd-0.2.6-py3-none-any.whl
  • Upload date:
  • Size: 148.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.16 {"installer":{"name":"uv","version":"0.11.16","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for vd-0.2.6-py3-none-any.whl
Algorithm Hash digest
SHA256 e85604764f162b4477d37963779b22891e444f6a0884c241de817001298fd459
MD5 8df9d8a39f6d35e262d12b510e38c846
BLAKE2b-256 fe0d392644f1dea71ef203ac991de7d3542a16ac2f42eac59c155a12ab17e825

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page