Skip to main content

A facade over vector databases — one interface, ~15 backends

Project description

vd

A facade over vector databases — one Pythonic interface, ~15 backends.

vd lets you operate on any vector database and switch between them with a one-word change, while keeping each backend's particular power one escape hatch away. It also helps you choose the right backend and set it up.

import vd

client = vd.connect("memory")          # switch DB = change this one word
col = client.create_collection("docs")
col["a"] = vd.Document(id="a", text="cats", vector=[0.1, 0.9, 0.0])
col["b"] = vd.Document(id="b", text="pizza", vector=[0.9, 0.0, 0.1])

for hit in col.search([0.1, 0.8, 0.0], limit=2):
    print(hit["id"], hit["score"])

Install

pip install vd                 # core (zero heavy deps) + the memory backend
pip install vd[chroma]         # + a specific backend's client
pip install vd[embedded]       # + all embedded backends (chroma, qdrant, faiss, …)
pip install vd[all-backends]   # + every backend client

The core is near-zero-dependency. Each backend's client library is an optional extra named after the backend.

The mental model

vd stores and searches vectors. Turning text into vectors — embedding — is deliberately external: vd never embeds on its own. This keeps the facade honest (most vector DBs do not embed for you) and lightweight.

  • Vector-first. You hold the embedding model. Hand vd Documents that already carry a vector; search with a pre-computed query vector.
  • Text convenience. Pass an embedder (text -> vector) to connect, and then raw text works: col["k"] = "some text", col.search("a query").

With no embedder, passing text raises EmbeddingRequiredError — loud, never a silent wrong-model embedding.

client = vd.connect("chroma", persist_directory="./db", embedder=my_embed_fn)
col = client.create_collection("docs")
col["a"] = "cats and kittens"                  # embedded for you
hits = list(col.search("pets", limit=5))       # query embedded for you

Choosing a backend

vd ships a provider registry distilled from a practitioner report (misc/docs/11 -- VectorDB Selection & Setup Guide ...md) and a recommender:

vd.print_recommendation(
    corpus_size="medium", persistence=True, can_run_docker=True,
    cloud_ok=True, budget="free", needs_hybrid=False,
)
vd.print_backends_table()                       # the whole landscape
vd.compare_backends(["chroma", "qdrant", "pgvector"])

Setting a backend up

vd.check_requirements("qdrant")    # diagnoses readiness, prints the next step
vd.setup_guide("qdrant")           # full pip / docker / env-var playbook
vd.install_backend("qdrant")       # the pip command (run=True to install)

check_requirements is deployment-aware: it checks the pip package for embedded backends, whether a server answers for self-hosted ones, and the required environment variables for managed ones — always ending with one concrete next action.

The API

Object Is a Plus
Client (from connect) Mapping[str, Collection] create_collection, get_collection, delete_collection, get_or_create_collection
Collection MutableMapping[str, Document] search(...)
Document dataclass id, text, vector, metadata
col["k"] = vd.Document(id="k", text="…", vector=[...], metadata={"y": 2024})
doc      = col["k"]            # get
del col["k"]                  # delete
"k" in col, len(col), list(col)

col.search(query, *, limit=10, filter=None, egress=None, **backend_kwargs)

search yields dicts {"id", "text", "score", "metadata"} (score is higher-is-better). Transform results with an egress: vd.id_only, vd.id_and_score, vd.text_only, vd.id_text_score, or your own.

Metadata filtering

One backend-agnostic, MongoDB-style filter language — $eq $ne $gt $gte $lt $lte $in $nin $exists $and $or $not:

col.search(qvec, filter={"year": {"$gte": 2020}, "kind": {"$in": ["news", "blog"]}})

Each backend declares which operators it honors natively; an unsupported one raises UnsupportedFilterError rather than silently mis-filtering. Backends with rich native filtering (Qdrant, Pinecone, MongoDB) translate the filter; the rest apply it client-side with the same semantics.

Escape hatches

The facade never traps you. client.client is the raw backend client; collection.native is the raw backend collection — both supported, documented API for reaching backend-specific features.

Backends

Archetype Backends
Embedded (pip-only) memory, chroma, lancedb, sqlite_vec, duckdb, faiss
Server (also embedded) qdrant, weaviate, milvus
Server redis, elasticsearch, pgvector
Managed pinecone, mongodb (Atlas), turbopuffer

vd.list_backends() shows what is installed and ready now.

The toolkit

Beyond the facade, vd bundles the composite operations people actually do:

  • vd.searchmulti_query_search, reciprocal_rank_fusion, search_similar_to_document, deduplicate_results.
  • vd.ioexport_collection / import_collection (JSONL, JSON, directory).
  • vd.migrationmigrate_collection, migrate_client, copy_collection — move data between any two backends.
  • vd.analyticscollection_stats, find_duplicates, find_outliers, validate_collection.
  • vd.healthhealth_check_backend, benchmark_search.
  • vd.text — convenience text cleaning / chunking.
  • vd.TimeIndexedCollection — a time-windowed wrapper over any collection.
  • CLIvd backends, vd install, vd export/import, vd migrate, …

AI-agent skills

vd ships skills (vd/data/skills/) so coding agents can drive it well: vd-quickstart, vd-backend-choose (choosing and setup), vd-ingest, vd-search, vd-ops.

Design

  • Embedding is external. The core operates on vectors; an embedder is an injected, optional convenience — never a hard dependency.
  • Two mappings. A Client is a Mapping of collections; a Collection is a MutableMapping of documents plus search. Idiomatic, minimal, familiar.
  • Thin adapters. AbstractClient / AbstractCollection implement everything users see; a backend supplies a handful of raw primitives. Adding a backend is ~150 lines — see the vd-add-backend skill.
  • Capabilities, not a fat base. Optional features (SupportsBatch, SupportsHybrid) are @runtime_checkable protocols you feature-discover.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vd-0.2.5.tar.gz (198.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vd-0.2.5-py3-none-any.whl (141.5 kB view details)

Uploaded Python 3

File details

Details for the file vd-0.2.5.tar.gz.

File metadata

  • Download URL: vd-0.2.5.tar.gz
  • Upload date:
  • Size: 198.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.16 {"installer":{"name":"uv","version":"0.11.16","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for vd-0.2.5.tar.gz
Algorithm Hash digest
SHA256 6c7ed8e980537fac15d12e4269011934fd7cd0d95e41bb324246742b84527aa9
MD5 777d08dcdc0a3840c94424d7647ab49f
BLAKE2b-256 f10bcc2820b7a9c1fdcca40dc45f16247349e6bcc336cfbf8cd52134e32d3f9b

See more details on using hashes here.

File details

Details for the file vd-0.2.5-py3-none-any.whl.

File metadata

  • Download URL: vd-0.2.5-py3-none-any.whl
  • Upload date:
  • Size: 141.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.16 {"installer":{"name":"uv","version":"0.11.16","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for vd-0.2.5-py3-none-any.whl
Algorithm Hash digest
SHA256 035b9fcb8a081eb6ea0a5ded1828be39e40f31cd2bd0652d65c9c33ce6332181
MD5 0cb606032ef32d267a011ab4632ea936
BLAKE2b-256 8f9b126bf627a0d54665b91d2dc3fb37724fc976c4b6e942ef6ee8a4d67d3963

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page