Python client for GVDB distributed vector database
Project description
gvdb
Python client for GVDB distributed vector database.
Install
pip install gvdb
# With bulk import extras (Parquet, NumPy, Pandas, progress bar)
pip install gvdb[import]
# All optional dependencies
pip install gvdb[import-all]
Quick Start
from gvdb import GVDBClient
client = GVDBClient("localhost:50051", api_key="your-key") # api_key is optional
# Create a collection
client.create_collection("my_vectors", dimension=768)
# Insert vectors
vectors = [[0.1, 0.2, ...], [0.3, 0.4, ...]] # list of float lists
ids = [1, 2]
client.insert("my_vectors", ids, vectors)
# Search
results = client.search("my_vectors", query_vector=[0.1, 0.2, ...], top_k=10)
for r in results:
print(f"ID: {r.id}, distance: {r.distance}")
# Hybrid search (BM25 + vector)
results = client.hybrid_search(
"my_vectors",
query_vector=[0.1, 0.2, ...],
text_query="running shoes",
top_k=10,
text_field="description", # metadata field to search
return_metadata=True,
)
# Clean up
client.drop_collection("my_vectors")
client.close()
Bulk Import
Import vectors from common ML formats. Auto-creates collections, supports resume via upsert idempotency, and shows progress bars (with tqdm).
import numpy as np
# From NumPy array
vectors = np.random.rand(100_000, 768).astype(np.float32)
result = client.import_numpy(vectors, "embeddings")
print(result) # ImportResult(total=100000, batches=10, elapsed=12.3s, ...)
# From Parquet (GVDB schema: id + vector + metadata columns)
result = client.import_parquet("vectors.parquet", "embeddings")
# From Pandas DataFrame
result = client.import_dataframe(df, "embeddings", vector_column="embedding")
# From CSV (JSON-encoded or dimension-prefixed vector columns)
result = client.import_csv("data.csv", "embeddings")
# From AnnData h5ad (scRNA-seq embeddings)
result = client.import_h5ad("adata.h5ad", "cells", embedding_key="X_pca")
All importers accept mode="upsert" (default, idempotent) or mode="stream_insert" (faster, no resume). See ImportResult for batch counts, timing, and failure tracking.
Optional dependency extras
| Extra | Dependencies | For |
|---|---|---|
gvdb[parquet] |
pyarrow | import_parquet |
gvdb[numpy] |
numpy | import_numpy |
gvdb[pandas] |
pandas, pyarrow | import_dataframe, import_csv |
gvdb[h5ad] |
anndata, numpy | import_h5ad |
gvdb[progress] |
tqdm | Progress bars |
gvdb[import] |
All above except anndata | Common ML workflows |
gvdb[import-all] |
Everything + polars | All formats |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gvdb-0.15.0.tar.gz.
File metadata
- Download URL: gvdb-0.15.0.tar.gz
- Upload date:
- Size: 49.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7c177a2f9d9be4793a2bb1806024b21c912ad3e558bd4c5aec32347a873c445f
|
|
| MD5 |
bd811d880f5a9c344e47c64272260e68
|
|
| BLAKE2b-256 |
661f28ac6ceefab289c1393109058304b1d41c13fff663c66fbe3bbc1ee4b541
|
Provenance
The following attestation bundles were made for gvdb-0.15.0.tar.gz:
Publisher:
release-please.yml on JonathanBerhe/gvdb
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
gvdb-0.15.0.tar.gz -
Subject digest:
7c177a2f9d9be4793a2bb1806024b21c912ad3e558bd4c5aec32347a873c445f - Sigstore transparency entry: 1317486454
- Sigstore integration time:
-
Permalink:
JonathanBerhe/gvdb@6478f6c204f1db8ed809b7fdce4f8b0ab1f4387e -
Branch / Tag:
refs/heads/main - Owner: https://github.com/JonathanBerhe
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release-please.yml@6478f6c204f1db8ed809b7fdce4f8b0ab1f4387e -
Trigger Event:
push
-
Statement type:
File details
Details for the file gvdb-0.15.0-py3-none-any.whl.
File metadata
- Download URL: gvdb-0.15.0-py3-none-any.whl
- Upload date:
- Size: 18.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f2ec28c794686274415779fdeed09b97f68ca2239b6faec1f3309c6ab9ec32f0
|
|
| MD5 |
753fc95183b3a9264b0c1492f62c5cc9
|
|
| BLAKE2b-256 |
8235f0e45ccff452ae63b2e82cb8813edeee7a06881a04042fc03d01a35afe12
|
Provenance
The following attestation bundles were made for gvdb-0.15.0-py3-none-any.whl:
Publisher:
release-please.yml on JonathanBerhe/gvdb
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
gvdb-0.15.0-py3-none-any.whl -
Subject digest:
f2ec28c794686274415779fdeed09b97f68ca2239b6faec1f3309c6ab9ec32f0 - Sigstore transparency entry: 1317486458
- Sigstore integration time:
-
Permalink:
JonathanBerhe/gvdb@6478f6c204f1db8ed809b7fdce4f8b0ab1f4387e -
Branch / Tag:
refs/heads/main - Owner: https://github.com/JonathanBerhe
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release-please.yml@6478f6c204f1db8ed809b7fdce4f8b0ab1f4387e -
Trigger Event:
push
-
Statement type: