Python client library for d-vecDB vector database
Project description
d-vecDB Python Client
A comprehensive Python client library for d-vecDB, providing both synchronous and asynchronous interfaces for vector database operations.
🚀 Features
Multi-Protocol Support
- REST API via HTTP/HTTPS with connection pooling
- gRPC for high-performance binary protocol communication
- Auto-detection with intelligent fallback
Synchronous & Asynchronous
- Sync client for traditional blocking operations
- Async client for high-concurrency applications
- Connection pooling and concurrent batch operations
Type Safety & Validation
- Pydantic models for data validation
- Type hints throughout the codebase
- Comprehensive error handling
Developer Experience
- Intuitive API with simple and advanced methods
- NumPy integration for seamless array handling
- Rich documentation and examples
📦 Installation
Recommended: Install d-vecDB with Python Client
# Install d-vecDB (includes Python client)
pip install d-vecdb
# Install with development dependencies
pip install d-vecdb[dev]
# Install with example dependencies
pip install d-vecdb[examples]
Alternative: Install Python Client Only
# Install just the Python client library
pip install vectordb-client
# Install with development dependencies
pip install vectordb-client[dev]
# Install with example dependencies
pip install vectordb-client[examples]
From Source
git clone https://github.com/rdmurugan/d-vecDB.git
cd d-vecDB/python-client
pip install -e .
🚀 Getting Started After Installation
Step 1: Build and Start the d-vecDB Server
Important: The PyPI package (pip install d-vecdb) only includes the Python client library. You need to build the server separately:
# Clone the repository and build the server
git clone https://github.com/rdmurugan/d-vecDB.git
cd d-vecDB
# Build the server (requires Rust)
cargo build --release
# Start the server
./target/release/vectordb-server --config config.toml
Step 2: Use the Python Client
Once you have a running d-vecDB server, you can use the Python client (installed via pip) to interact with it:
import numpy as np
from vectordb_client import VectorDBClient
# Connect to your d-vecDB server
client = VectorDBClient(host="localhost", port=8080)
# Create a collection
client.create_collection_simple("my_collection", 128, "cosine")
# Insert some vectors
vector = np.random.random(128)
client.insert_simple("my_collection", "vector_1", vector)
# Search for similar vectors
query = np.random.random(128)
results = client.search_simple("my_collection", query, limit=5)
print(f"Found {len(results)} similar vectors")
for result in results:
print(f" - ID: {result.id}, Distance: {result.distance:.4f}")
client.close()
🏃 Quick Start
Synchronous Client
import numpy as np
from vectordb_client import VectorDBClient
# Connect to d-vecDB server
client = VectorDBClient(host="localhost", port=8080)
# Create a collection
client.create_collection_simple(
name="documents",
dimension=128,
distance_metric="cosine"
)
# Insert vectors
vectors = np.random.random((100, 128))
for i, vector in enumerate(vectors):
client.insert_simple(
collection_name="documents",
vector_id=f"doc_{i}",
vector_data=vector,
metadata={"title": f"Document {i}", "category": "example"}
)
# Search for similar vectors
query_vector = np.random.random(128)
results = client.search_simple("documents", query_vector, limit=5)
for result in results:
print(f"ID: {result.id}, Distance: {result.distance:.4f}")
# Clean up
client.close()
Asynchronous Client
import asyncio
import numpy as np
from vectordb_client import AsyncVectorDBClient
async def main():
# Connect to d-vecDB server
async with AsyncVectorDBClient(host="localhost", port=8080) as client:
# Create collection
await client.create_collection_simple(
name="embeddings",
dimension=384,
distance_metric="cosine"
)
# Prepare batch data
batch_data = [
(f"item_{i}", np.random.random(384), {"category": "test"})
for i in range(1000)
]
# Concurrent batch insertion
await client.batch_insert_concurrent(
collection_name="embeddings",
vectors_data=batch_data,
batch_size=50,
max_concurrent_batches=10
)
# Search
query_vector = np.random.random(384)
results = await client.search_simple("embeddings", query_vector, limit=10)
print(f"Found {len(results)} similar vectors")
# Run the async example
asyncio.run(main())
📖 API Reference
Client Initialization
from vectordb_client import VectorDBClient, AsyncVectorDBClient
# Synchronous client
client = VectorDBClient(
host="localhost",
port=8080, # REST port
grpc_port=9090, # gRPC port
protocol="rest", # "rest", "grpc", or "auto"
ssl=False, # Use HTTPS/secure gRPC
timeout=30.0, # Request timeout
)
# Asynchronous client
async_client = AsyncVectorDBClient(
host="localhost",
port=8080,
connection_pool_size=10, # HTTP connection pool size
protocol="rest",
ssl=False,
timeout=30.0,
)
Collection Management
from vectordb_client.types import CollectionConfig, DistanceMetric, IndexConfig
# Advanced collection configuration
config = CollectionConfig(
name="my_collection",
dimension=768,
distance_metric=DistanceMetric.COSINE,
index_config=IndexConfig(
max_connections=32,
ef_construction=400,
ef_search=100,
max_layer=16
)
)
# Create collection
response = client.create_collection(config)
# List all collections
collections = client.list_collections()
print("Collections:", collections.collections)
# Get collection info and stats
collection_info = client.get_collection("my_collection")
stats = client.get_collection_stats("my_collection")
print(f"Vectors: {stats.vector_count}, Memory: {stats.memory_usage} bytes")
# Delete collection
client.delete_collection("my_collection")
Vector Operations
from vectordb_client.types import Vector
import numpy as np
# Create vectors with metadata
vectors = [
Vector(
id="vec_1",
data=np.random.random(128).tolist(),
metadata={"category": "A", "score": 0.95}
),
Vector(
id="vec_2",
data=np.random.random(128).tolist(),
metadata={"category": "B", "score": 0.87}
)
]
# Insert single vector
response = client.insert_vector("my_collection", vectors[0])
# Batch insert
response = client.insert_vectors("my_collection", vectors)
print(f"Inserted {response.inserted_count} vectors")
# Get vector by ID
vector = client.get_vector("my_collection", "vec_1")
print(f"Retrieved vector: {vector.id}")
# Update vector
vectors[0].metadata["updated"] = True
client.update_vector("my_collection", vectors[0])
# Delete vector
client.delete_vector("my_collection", "vec_1")
Vector Search
from vectordb_client.types import SearchRequest
import numpy as np
# Simple search
query_vector = np.random.random(128)
results = client.search_simple("my_collection", query_vector, limit=10)
# Advanced search with parameters
search_request = SearchRequest(
query_vector=query_vector.tolist(),
limit=20,
ef_search=150, # Higher value = better accuracy, slower search
filter={"category": "A"} # Metadata filtering
)
response = client.search("my_collection",
search_request.query_vector,
search_request.limit,
search_request.ef_search,
search_request.filter)
# Process results
for result in response.results:
print(f"ID: {result.id}")
print(f"Distance: {result.distance:.6f}")
print(f"Metadata: {result.metadata}")
print("---")
print(f"Search took {response.query_time_ms}ms")
Server Information
# Health check
health = client.health_check()
print(f"Server healthy: {health.healthy}")
# Server statistics
stats = client.get_server_stats()
print(f"Total vectors: {stats.total_vectors}")
print(f"Collections: {stats.total_collections}")
print(f"Memory usage: {stats.memory_usage} bytes")
print(f"Uptime: {stats.uptime_seconds}s")
# Quick connectivity test
is_reachable = client.ping()
print(f"Server reachable: {is_reachable}")
# Comprehensive info
info = client.get_info()
print("Client info:", info["client"])
print("Server info:", info["server"])
🧪 Advanced Examples
Working with NumPy Arrays
import numpy as np
from vectordb_client import VectorDBClient
from vectordb_client.types import Vector
client = VectorDBClient()
# Create collection for embeddings
client.create_collection_simple("embeddings", 384, "cosine")
# Work directly with NumPy arrays
embeddings = np.random.random((1000, 384))
ids = [f"embedding_{i}" for i in range(1000)]
metadata_list = [{"index": i, "batch": i // 100} for i in range(1000)]
# Batch insert using NumPy
vectors = [
Vector.from_numpy(id=ids[i], data=embeddings[i], metadata=metadata_list[i])
for i in range(len(embeddings))
]
# Insert in batches
batch_size = 100
for i in range(0, len(vectors), batch_size):
batch = vectors[i:i + batch_size]
response = client.insert_vectors("embeddings", batch)
print(f"Inserted batch {i // batch_size + 1}: {response.inserted_count} vectors")
# Search with NumPy array
query_embedding = np.random.random(384)
results = client.search_simple("embeddings", query_embedding, limit=5)
# Convert results back to NumPy if needed
for result in results:
vector = client.get_vector("embeddings", result.id)
vector_array = vector.to_numpy() # Convert to NumPy array
print(f"Vector {result.id} shape: {vector_array.shape}")
Async Batch Processing
import asyncio
import numpy as np
from vectordb_client import AsyncVectorDBClient
async def process_large_dataset():
async with AsyncVectorDBClient() as client:
# Create collection
await client.create_collection_simple("large_dataset", 512, "euclidean")
# Generate large dataset
num_vectors = 10000
dimension = 512
dataset = np.random.random((num_vectors, dimension))
# Prepare batch data
batch_data = [
(f"vec_{i}", dataset[i], {"batch": i // 1000, "index": i})
for i in range(num_vectors)
]
# Concurrent insertion with progress tracking
batch_size = 200
max_concurrent = 20
start_time = asyncio.get_event_loop().time()
responses = await client.batch_insert_concurrent(
collection_name="large_dataset",
vectors_data=batch_data,
batch_size=batch_size,
max_concurrent_batches=max_concurrent
)
end_time = asyncio.get_event_loop().time()
total_inserted = sum(r.inserted_count or 0 for r in responses)
duration = end_time - start_time
rate = total_inserted / duration
print(f"Inserted {total_inserted} vectors in {duration:.2f}s")
print(f"Rate: {rate:.2f} vectors/second")
# Verify with search
query_vector = np.random.random(512)
results = await client.search_simple("large_dataset", query_vector, limit=10)
print(f"Search found {len(results)} results")
# Run the async processing
asyncio.run(process_large_dataset())
Error Handling and Retries
import time
from vectordb_client import VectorDBClient
from vectordb_client.exceptions import (
VectorDBError, ConnectionError, CollectionNotFoundError,
VectorNotFoundError, RateLimitError
)
def robust_insert_with_retry(client, collection_name, vectors, max_retries=3):
"""Insert vectors with automatic retry on failure."""
for attempt in range(max_retries):
try:
response = client.insert_vectors(collection_name, vectors)
print(f"Successfully inserted {response.inserted_count} vectors")
return response
except RateLimitError as e:
if attempt < max_retries - 1:
wait_time = 2 ** attempt # Exponential backoff
print(f"Rate limited, waiting {wait_time}s before retry...")
time.sleep(wait_time)
else:
raise e
except ConnectionError as e:
if attempt < max_retries - 1:
print(f"Connection failed, retrying... ({attempt + 1}/{max_retries})")
time.sleep(1)
else:
raise e
except CollectionNotFoundError:
print(f"Collection '{collection_name}' not found, creating...")
client.create_collection_simple(collection_name, 128, "cosine")
# Retry the insertion
continue
raise VectorDBError(f"Failed to insert after {max_retries} attempts")
# Usage
client = VectorDBClient()
vectors = [Vector(id=f"test_{i}", data=[0.1] * 128) for i in range(10)]
try:
robust_insert_with_retry(client, "test_collection", vectors)
except VectorDBError as e:
print(f"Final error: {e}")
Configuration and Connection Management
from vectordb_client import VectorDBClient
import os
# Configuration from environment variables
client = VectorDBClient(
host=os.getenv("VECTORDB_HOST", "localhost"),
port=int(os.getenv("VECTORDB_PORT", "8080")),
ssl=os.getenv("VECTORDB_SSL", "false").lower() == "true",
timeout=float(os.getenv("VECTORDB_TIMEOUT", "30.0"))
)
# Connection testing and fallback
def get_client_with_fallback():
"""Try multiple connection options."""
# Try primary server
try:
primary_client = VectorDBClient(host="primary.vectordb.com", port=8080)
if primary_client.ping():
return primary_client
primary_client.close()
except Exception:
pass
# Try secondary server
try:
secondary_client = VectorDBClient(host="secondary.vectordb.com", port=8080)
if secondary_client.ping():
return secondary_client
secondary_client.close()
except Exception:
pass
# Fall back to localhost
return VectorDBClient(host="localhost", port=8080)
# Context managers for resource cleanup
with get_client_with_fallback() as client:
# Use client here - automatically closed when leaving context
collections = client.list_collections()
print(f"Available collections: {collections.collections}")
🧪 Testing
# Run unit tests
python -m pytest tests/
# Run with coverage
python -m pytest tests/ --cov=vectordb_client --cov-report=html
# Run integration tests (requires running d-vecDB server)
python -m pytest tests/integration/ -v
# Run performance benchmarks
python -m pytest tests/benchmarks/ -v
🔧 Development
# Setup development environment
git clone https://github.com/rdmurugan/d-vecDB.git
cd d-vecDB/python-client
# Install in development mode
pip install -e .[dev]
# Run code formatting
black vectordb_client/
isort vectordb_client/
# Run type checking
mypy vectordb_client/
# Run linting
flake8 vectordb_client/
📊 Performance Tips
Batch Operations
- Use
insert_vectors()instead of multipleinsert_vector()calls - For async clients, use
batch_insert_concurrent()for maximum throughput - Optimal batch size is typically 100-1000 vectors depending on dimension
Connection Pooling
- Async clients automatically pool HTTP connections
- Increase
connection_pool_sizefor high-concurrency applications - Reuse client instances instead of creating new ones
Search Optimization
- Lower
ef_searchvalues for faster but less accurate search - Use metadata filtering to reduce search space
- Consider the trade-off between speed and recall
Memory Management
- Use NumPy arrays for large vector datasets
- Close clients explicitly or use context managers
- Monitor memory usage with large batch operations
🤝 Contributing
We welcome contributions! Please see our Contributing Guide for details.
Development Setup
- Fork the repository
- Create a feature branch
- Install development dependencies:
pip install -e .[dev] - Make changes and add tests
- Run tests:
pytest - Submit a pull request
📄 License
This project is licensed under the d-vecDB Enterprise License - see the LICENSE file for details.
For Enterprise Use: Commercial usage requires a separate enterprise license. Contact durai@infinidatum.com for licensing terms.
🆘 Support
- Documentation: docs.d-vecdb.com
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Discord: d-vecDB Community
Built with ❤️ by the d-vecDB team
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vectordb_client-0.1.6.tar.gz.
File metadata
- Download URL: vectordb_client-0.1.6.tar.gz
- Upload date:
- Size: 53.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9ae3d856374c6e1bd173acffe5eed612491b9f1123e848fc515bcaef27b6902b
|
|
| MD5 |
a4c32c0fc993ddce207cc206d2148242
|
|
| BLAKE2b-256 |
5e47ee37286a040c9a0a4d6e08058e66500d52b6e3f80aa1f81421a43a305dbe
|
File details
Details for the file vectordb_client-0.1.6-py3-none-any.whl.
File metadata
- Download URL: vectordb_client-0.1.6-py3-none-any.whl
- Upload date:
- Size: 64.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
edd8e0faff5362c81281c0c00a320aaf1e62c2bb189fd41f1a2e5afd009a6ffc
|
|
| MD5 |
b9a516bb0d6cd612790b26a160981875
|
|
| BLAKE2b-256 |
c6ff4a565c5a689e0646fb8d5c18f1cfa9ea5cd90a73e0535bd1ba275cbd2391
|