Skip to main content

A facade over vector databases — one interface, ~15 backends

Project description

vd

A facade over vector databases — one Pythonic interface, ~15 backends.

vd lets you operate on any vector database and switch between them with a one-word change, while keeping each backend's particular power one escape hatch away. It also helps you choose the right backend and set it up.

import vd

client = vd.connect("memory")          # switch DB = change this one word
col = client.create_collection("docs")
col["a"] = vd.Document(id="a", text="cats", vector=[0.1, 0.9, 0.0])
col["b"] = vd.Document(id="b", text="pizza", vector=[0.9, 0.0, 0.1])

for hit in col.search([0.1, 0.8, 0.0], limit=2):
    print(hit["id"], hit["score"])

Install

pip install vd                 # core (zero heavy deps) + the memory backend
pip install vd[chroma]         # + a specific backend's client
pip install vd[embedded]       # + all embedded backends (chroma, qdrant, faiss, …)
pip install vd[all-backends]   # + every backend client

The core is near-zero-dependency. Each backend's client library is an optional extra named after the backend.

The mental model

vd stores and searches vectors. Turning text into vectors — embedding — is deliberately external: vd never embeds on its own. This keeps the facade honest (most vector DBs do not embed for you) and lightweight.

  • Vector-first. You hold the embedding model. Hand vd Documents that already carry a vector; search with a pre-computed query vector.
  • Text convenience. Pass an embedder (text -> vector) to connect, and then raw text works: col["k"] = "some text", col.search("a query").

With no embedder, passing text raises EmbeddingRequiredError — loud, never a silent wrong-model embedding.

client = vd.connect("chroma", persist_directory="./db", embedder=my_embed_fn)
col = client.create_collection("docs")
col["a"] = "cats and kittens"                  # embedded for you
hits = list(col.search("pets", limit=5))       # query embedded for you

Choosing a backend

vd ships a provider registry distilled from a practitioner report (misc/docs/11 -- VectorDB Selection & Setup Guide ...md) and a recommender:

vd.print_recommendation(
    corpus_size="medium", persistence=True, can_run_docker=True,
    cloud_ok=True, budget="free", needs_hybrid=False,
)
vd.print_backends_table()                       # the whole landscape
vd.compare_backends(["chroma", "qdrant", "pgvector"])

Setting a backend up

vd.check_requirements("qdrant")    # diagnoses readiness, prints the next step
vd.setup_guide("qdrant")           # full pip / docker / env-var playbook
vd.install_backend("qdrant")       # the pip command (run=True to install)

check_requirements is deployment-aware: it checks the pip package for embedded backends, whether a server answers for self-hosted ones, and the required environment variables for managed ones — always ending with one concrete next action.

The API

Object Is a Plus
Client (from connect) Mapping[str, Collection] create_collection, get_collection, delete_collection, get_or_create_collection
Collection MutableMapping[str, Document] search(...)
Document dataclass id, text, vector, metadata
col["k"] = vd.Document(id="k", text="…", vector=[...], metadata={"y": 2024})
doc      = col["k"]            # get
del col["k"]                  # delete
"k" in col, len(col), list(col)

col.search(query, *, limit=10, filter=None, egress=None, **backend_kwargs)

search yields dicts {"id", "text", "score", "metadata"} (score is higher-is-better). Transform results with an egress: vd.id_only, vd.id_and_score, vd.text_only, vd.id_text_score, or your own.

Metadata filtering

One backend-agnostic, MongoDB-style filter language — $eq $ne $gt $gte $lt $lte $in $nin $exists $and $or $not:

col.search(qvec, filter={"year": {"$gte": 2020}, "kind": {"$in": ["news", "blog"]}})

Each backend declares which operators it honors natively; an unsupported one raises UnsupportedFilterError rather than silently mis-filtering. Backends with rich native filtering (Qdrant, Pinecone, MongoDB) translate the filter; the rest apply it client-side with the same semantics.

Escape hatches

The facade never traps you. client.client is the raw backend client; collection.native is the raw backend collection — both supported, documented API for reaching backend-specific features.

Backends

Archetype Backends
Embedded (pip-only) memory, chroma, lancedb, sqlite_vec, duckdb, faiss
Server (also embedded) qdrant, weaviate, milvus
Server redis, elasticsearch, pgvector
Managed pinecone, mongodb (Atlas), turbopuffer

vd.list_backends() shows what is installed and ready now.

The toolkit

Beyond the facade, vd bundles the composite operations people actually do:

  • vd.searchmulti_query_search, reciprocal_rank_fusion, search_similar_to_document, deduplicate_results.
  • vd.ioexport_collection / import_collection (JSONL, JSON, directory).
  • vd.migrationmigrate_collection, migrate_client, copy_collection — move data between any two backends.
  • vd.analyticscollection_stats, find_duplicates, find_outliers, validate_collection.
  • vd.healthhealth_check_backend, benchmark_search.
  • vd.text — convenience text cleaning / chunking.
  • vd.TimeIndexedCollection — a time-windowed wrapper over any collection.
  • CLIvd backends, vd install, vd export/import, vd migrate, …

AI-agent skills

vd ships skills (vd/data/skills/) so coding agents can drive it well: vd-quickstart, vd-backend-choose (choosing and setup), vd-ingest, vd-search, vd-ops.

Design

  • Embedding is external. The core operates on vectors; an embedder is an injected, optional convenience — never a hard dependency.
  • Two mappings. A Client is a Mapping of collections; a Collection is a MutableMapping of documents plus search. Idiomatic, minimal, familiar.
  • Thin adapters. AbstractClient / AbstractCollection implement everything users see; a backend supplies a handful of raw primitives. Adding a backend is ~150 lines — see the vd-add-backend skill.
  • Capabilities, not a fat base. Optional features (SupportsBatch, SupportsHybrid) are @runtime_checkable protocols you feature-discover.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vd-0.2.1.tar.gz (194.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vd-0.2.1-py3-none-any.whl (140.4 kB view details)

Uploaded Python 3

File details

Details for the file vd-0.2.1.tar.gz.

File metadata

  • Download URL: vd-0.2.1.tar.gz
  • Upload date:
  • Size: 194.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.15 {"installer":{"name":"uv","version":"0.11.15","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for vd-0.2.1.tar.gz
Algorithm Hash digest
SHA256 936f096486b77e56b5d1b4eb7724f37d0cb1d3cacfb0fd147c3c6221072fafc5
MD5 ffa73c98103e60b13966a0a0111495ee
BLAKE2b-256 b71afcda7aaa4994a3f7cb438f52da9857276f0fe59453c3f71da759fa03c879

See more details on using hashes here.

File details

Details for the file vd-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: vd-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 140.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.15 {"installer":{"name":"uv","version":"0.11.15","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for vd-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ba4f5f4850a08a3b90f2d5d679d3887fa81cd8911ddbbf5577095a3b29481449
MD5 d90d2a43fa803f7ca3a604d2324bd010
BLAKE2b-256 f027b197e84ef2001355356048a2b0b567fcb656e412d7ce2b526c5ac4380527

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page