Skip to main content

A facade over vector databases — one interface, ~15 backends

Project description

vd

A facade over vector databases — one Pythonic interface, ~15 backends.

vd lets you operate on any vector database and switch between them with a one-word change, while keeping each backend's particular power one escape hatch away. It also helps you choose the right backend and set it up.

import vd

client = vd.connect("memory")          # switch DB = change this one word
col = client.create_collection("docs")
col["a"] = vd.Document(id="a", text="cats", vector=[0.1, 0.9, 0.0])
col["b"] = vd.Document(id="b", text="pizza", vector=[0.9, 0.0, 0.1])

for hit in col.search([0.1, 0.8, 0.0], limit=2):
    print(hit["id"], hit["score"])

Install

pip install vd                 # core (zero heavy deps) + the memory backend
pip install vd[chroma]         # + a specific backend's client
pip install vd[embedded]       # + all embedded backends (chroma, qdrant, faiss, …)
pip install vd[all-backends]   # + every backend client

The core is near-zero-dependency. Each backend's client library is an optional extra named after the backend.

The mental model

vd stores and searches vectors. Turning text into vectors — embedding — is deliberately external: vd never embeds on its own. This keeps the facade honest (most vector DBs do not embed for you) and lightweight.

  • Vector-first. You hold the embedding model. Hand vd Documents that already carry a vector; search with a pre-computed query vector.
  • Text convenience. Pass an embedder (text -> vector) to connect, and then raw text works: col["k"] = "some text", col.search("a query").

With no embedder, passing text raises EmbeddingRequiredError — loud, never a silent wrong-model embedding.

client = vd.connect("chroma", persist_directory="./db", embedder=my_embed_fn)
col = client.create_collection("docs")
col["a"] = "cats and kittens"                  # embedded for you
hits = list(col.search("pets", limit=5))       # query embedded for you

Choosing a backend

vd ships a provider registry distilled from a practitioner report (misc/docs/11 -- VectorDB Selection & Setup Guide ...md) and a recommender:

vd.print_recommendation(
    corpus_size="medium", persistence=True, can_run_docker=True,
    cloud_ok=True, budget="free", needs_hybrid=False,
)
vd.print_backends_table()                       # the whole landscape
vd.compare_backends(["chroma", "qdrant", "pgvector"])

Setting a backend up

vd.check_requirements("qdrant")    # diagnoses readiness, prints the next step
vd.setup_guide("qdrant")           # full pip / docker / env-var playbook
vd.install_backend("qdrant")       # the pip command (run=True to install)

check_requirements is deployment-aware: it checks the pip package for embedded backends, whether a server answers for self-hosted ones, and the required environment variables for managed ones — always ending with one concrete next action.

The API

Object Is a Plus
Client (from connect) Mapping[str, Collection] create_collection, get_collection, delete_collection, get_or_create_collection
Collection MutableMapping[str, Document] search(...)
Document dataclass id, text, vector, metadata
col["k"] = vd.Document(id="k", text="…", vector=[...], metadata={"y": 2024})
doc      = col["k"]            # get
del col["k"]                  # delete
"k" in col, len(col), list(col)

col.search(query, *, limit=10, filter=None, egress=None, **backend_kwargs)

search yields dicts {"id", "text", "score", "metadata"} (score is higher-is-better). Transform results with an egress: vd.id_only, vd.id_and_score, vd.text_only, vd.id_text_score, or your own.

Metadata filtering

One backend-agnostic, MongoDB-style filter language — $eq $ne $gt $gte $lt $lte $in $nin $exists $and $or $not:

col.search(qvec, filter={"year": {"$gte": 2020}, "kind": {"$in": ["news", "blog"]}})

Each backend declares which operators it honors natively; an unsupported one raises UnsupportedFilterError rather than silently mis-filtering. Backends with rich native filtering (Qdrant, Pinecone, MongoDB) translate the filter; the rest apply it client-side with the same semantics.

Escape hatches

The facade never traps you. client.client is the raw backend client; collection.native is the raw backend collection — both supported, documented API for reaching backend-specific features.

Backends

Archetype Backends
Embedded (pip-only) memory, chroma, lancedb, sqlite_vec, duckdb, faiss
Server (also embedded) qdrant, weaviate, milvus
Server redis, elasticsearch, pgvector
Managed pinecone, mongodb (Atlas), turbopuffer

vd.list_backends() shows what is installed and ready now.

The toolkit

Beyond the facade, vd bundles the composite operations people actually do:

  • vd.searchmulti_query_search, reciprocal_rank_fusion, search_similar_to_document, deduplicate_results.
  • vd.ioexport_collection / import_collection (JSONL, JSON, directory).
  • vd.migrationmigrate_collection, migrate_client, copy_collection — move data between any two backends.
  • vd.analyticscollection_stats, find_duplicates, find_outliers, validate_collection.
  • vd.healthhealth_check_backend, benchmark_search.
  • vd.text — convenience text cleaning / chunking.
  • vd.TimeIndexedCollection — a time-windowed wrapper over any collection.
  • CLIvd backends, vd install, vd export/import, vd migrate, …

AI-agent skills

vd ships skills (vd/data/skills/) so coding agents can drive it well: vd-quickstart, vd-backend-choose (choosing and setup), vd-ingest, vd-search, vd-ops.

Design

  • Embedding is external. The core operates on vectors; an embedder is an injected, optional convenience — never a hard dependency.
  • Two mappings. A Client is a Mapping of collections; a Collection is a MutableMapping of documents plus search. Idiomatic, minimal, familiar.
  • Thin adapters. AbstractClient / AbstractCollection implement everything users see; a backend supplies a handful of raw primitives. Adding a backend is ~150 lines — see the vd-add-backend skill.
  • Capabilities, not a fat base. Optional features (SupportsBatch, SupportsHybrid) are @runtime_checkable protocols you feature-discover.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vd-0.2.8.tar.gz (215.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vd-0.2.8-py3-none-any.whl (154.9 kB view details)

Uploaded Python 3

File details

Details for the file vd-0.2.8.tar.gz.

File metadata

  • Download URL: vd-0.2.8.tar.gz
  • Upload date:
  • Size: 215.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.16 {"installer":{"name":"uv","version":"0.11.16","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for vd-0.2.8.tar.gz
Algorithm Hash digest
SHA256 f9895c253e9825de10e831c773222e051eaa186d4ef12ef5934a3bb9ef39bfa6
MD5 da4b0c344f56aa0aa17079daf6484f04
BLAKE2b-256 c31aa654cce27b101aba7004efd7c46467060504b9c439dd4e8ce092e6938bbd

See more details on using hashes here.

File details

Details for the file vd-0.2.8-py3-none-any.whl.

File metadata

  • Download URL: vd-0.2.8-py3-none-any.whl
  • Upload date:
  • Size: 154.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.16 {"installer":{"name":"uv","version":"0.11.16","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for vd-0.2.8-py3-none-any.whl
Algorithm Hash digest
SHA256 091a0c2d6b489b6c03ac8218f2d5d8f9160073898a2fba7165a4b8844726ab60
MD5 b726a55580d8edab9d7d44fd5c8eb829
BLAKE2b-256 5522360fdc9c6be43889948e359a90eeb81f22af6ad742eb3e35c2a42471215f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page