High-performance vector database written in Rust with Python client

These details have not been verified by PyPI

Project links

Project description

🚀 VectorDB-RS

A high-performance, production-ready vector database written in Rust

VectorDB-RS is a modern vector database designed for AI applications, semantic search, and similarity matching. Built from the ground up in Rust, it delivers exceptional performance, memory safety, and concurrent processing capabilities.

🎯 Key Features

⚡ Ultra-High Performance

Sub-microsecond vector operations (28-76ns per distance calculation)
HNSW indexing with O(log N) search complexity
Concurrent processing with Rust's fearless concurrency
Memory-mapped storage for efficient large dataset handling

🏗️ Production Architecture

gRPC & REST APIs for universal client compatibility
Write-Ahead Logging (WAL) for ACID durability and crash recovery
Multi-threaded indexing and query processing
Comprehensive error handling and observability

🔧 Developer Experience

Type-safe APIs with Protocol Buffers
Rich metadata support with JSON field storage
Comprehensive benchmarking suite with HTML reports
CLI tools for database management

📊 Enterprise Ready

Horizontal scaling capabilities
Monitoring integration with Prometheus metrics
Flexible deployment (standalone, containerized, embedded)
Cross-platform support (Linux, macOS, Windows)

📈 Benchmark Results

Tested on macOS Darwin 24.6.0 with optimized release builds

Distance Calculations

Operation	Latency	Throughput
Dot Product	28.3 ns	35.4M ops/sec
Euclidean Distance	30.6 ns	32.7M ops/sec
Cosine Similarity	76.1 ns	13.1M ops/sec

HNSW Index Operations

Operation	Performance	Scale
Vector Insertion	7,108 vectors/sec	1,000 vectors benchmark
Vector Search	13,150 queries/sec	5,000 vector dataset
With Metadata	2,560 inserts/sec	Rich JSON metadata

Performance Projections on Higher-End Hardware

Based on our benchmark results, here are conservative performance extrapolations for production hardware:

High-End Server (32-core AMD EPYC, 128GB RAM, NVMe)

Operation	Current (Mac)	Projected (Server)	Improvement
Distance Calculations	35M ops/sec	150M+ ops/sec	4.3x
Vector Insertion	7K vectors/sec	50K+ vectors/sec	7x
Vector Search	13K queries/sec	100K+ queries/sec	7.7x
Concurrent Queries	Single-threaded	500K+ queries/sec	38x

Optimized Cloud Instance (16-core, 64GB RAM, SSD)

Operation	Current (Mac)	Projected (Cloud)	Improvement
Distance Calculations	35M ops/sec	80M+ ops/sec	2.3x
Vector Insertion	7K vectors/sec	25K+ vectors/sec	3.6x
Vector Search	13K queries/sec	45K+ queries/sec	3.5x
Concurrent Queries	Single-threaded	180K+ queries/sec	14x

Projections based on CPU core scaling, memory bandwidth improvements, and storage I/O optimizations

🏗️ Architecture

┌─────────────────────────────────────────────────────────────┐
│                  🎯 VectorDB-RS Stack                       │
├─────────────────────────────────────────────────────────────┤
│  CLI Tool      │  Client SDKs   │  REST + gRPC APIs          │
│  (Management)  │  (Rust/Python) │  (Universal Access)        │
├─────────────────────────────────────────────────────────────┤
│                    Vector Store Engine                      │
│              (Indexing + Storage + Querying)                │
├─────────────────────────────────────────────────────────────┤
│  HNSW Index    │   WAL Storage   │   Memory Mapping          │
│  (O(log N))    │   (Durability)  │   (Performance)           │
└─────────────────────────────────────────────────────────────┘

Core Components

🔍 HNSW Index: Hierarchical Navigable Small World graphs for approximate nearest neighbor search
💾 Storage Engine: Memory-mapped files with write-ahead logging for durability
🌐 API Layer: Both REST (HTTP/JSON) and gRPC (Protocol Buffers) interfaces
📊 Monitoring: Built-in Prometheus metrics and comprehensive logging
🔧 CLI Tools: Database management, collection operations, and administrative tasks

🚀 Quick Start

Installation

Option 1: Install from PyPI (Recommended)

# Install d-vecDB with Python client
pip install d-vecdb

# Or install with development extras
pip install d-vecdb[dev,docs,examples]

Option 2: Install from Source

# Clone the repository
git clone https://github.com/rdmurugan/d-vecDB.git
cd d-vecDB

# Quick install using script
./scripts/install.sh

# Or manual installation
pip install .

Option 3: For Development

# Clone and setup development environment
git clone https://github.com/rdmurugan/d-vecDB.git
cd d-vecDB

# Install in development mode with all extras
./scripts/install.sh dev

# Build Rust server components
./scripts/build-server.sh

Option 4: Using Virtual Environment

# Create isolated environment
./scripts/install.sh venv
source venv/bin/activate  # Linux/macOS
# venv\Scripts\activate   # Windows

Start the Server

# Start with default configuration
./target/release/vectordb-server --config config.toml

# Or with custom settings
./target/release/vectordb-server \
  --host 0.0.0.0 \
  --port 8080 \
  --data-dir /path/to/data \
  --log-level info

Basic Usage

# Create a collection
curl -X POST http://localhost:8080/collections \
  -H "Content-Type: application/json" \
  -d '{
    "name": "documents",
    "dimension": 128,
    "distance_metric": "cosine"
  }'

# Insert vectors
curl -X POST http://localhost:8080/collections/documents/vectors \
  -H "Content-Type: application/json" \
  -d '{
    "id": "doc1",
    "data": [0.1, 0.2, 0.3, ...],
    "metadata": {"title": "Example Document"}
  }'

# Search for similar vectors
curl -X POST http://localhost:8080/collections/documents/search \
  -H "Content-Type: application/json" \
  -d '{
    "query_vector": [0.1, 0.2, 0.3, ...],
    "limit": 10
  }'

🛠️ Development Setup

Prerequisites

Rust 1.70+ (Install Rust)
Protocol Buffers compiler (protoc)
Git for version control

Build Instructions

# Development build
cargo build

# Optimized release build  
cargo build --release

# Run all tests
cargo test

# Run benchmarks
cargo bench --package vectordb-common

# Generate documentation
cargo doc --open

Project Structure

d-vecDB/
├── common/          # Core types, distance functions, utilities
├── index/           # HNSW indexing implementation
├── storage/         # WAL, memory-mapping, persistence
├── vectorstore/     # Main vector store engine
├── server/          # REST & gRPC API servers
├── python-client/   # 🐍 Official Python client library
├── client/          # Additional client SDKs and libraries
├── cli/             # Command-line tools
├── proto/           # Protocol Buffer definitions
└── benchmarks/      # Performance testing suite

📚 Client Libraries

VectorDB-RS provides official client libraries for multiple programming languages:

🐍 Python Client

Full-featured Python client with async support, NumPy integration, and type safety.

🔄 Sync & Async: Both synchronous and asynchronous clients
⚡ High Performance: Concurrent batch operations (1000+ vectors/sec)
🧮 NumPy Native: Direct NumPy array support
🔒 Type Safe: Pydantic models with validation
🌐 Multi-Protocol: REST and gRPC support

# Install from PyPI
pip install vectordb-client

# Quick usage
from vectordb_client import VectorDBClient
import numpy as np

client = VectorDBClient()
client.create_collection_simple("docs", 384, "cosine")
client.insert_simple("docs", "doc_1", np.random.random(384))
results = client.search_simple("docs", np.random.random(384), limit=5)

📖 Complete Python Documentation →

🦀 Rust Client (Native)

Direct access to the native Rust API for maximum performance.

🌐 HTTP/REST API

Language-agnostic REST API with OpenAPI specification.

📖 API Documentation →

🚧 Coming Soon

JavaScript/TypeScript client
Go client
Java client
C++ bindings

📊 Comprehensive Benchmarking

Running Benchmarks

# Core performance benchmarks
cargo bench --package vectordb-common

# Generate HTML reports
cargo bench --package vectordb-common
open target/criterion/report/index.html

# Custom benchmark suite
./scripts/run-comprehensive-benchmarks.sh

Benchmark Categories

🧮 Distance Calculations: Core mathematical operations (cosine, euclidean, dot product)
🗂️ Index Operations: Vector insertion, search, and maintenance
💾 Storage Performance: WAL writes, memory-mapped reads, persistence
🌐 API Throughput: REST and gRPC endpoint performance
📈 Scaling Tests: Performance under load with varying dataset sizes

Hardware Optimization Guide

For Maximum Insertion Throughput:

CPU: High core count (32+ cores) for parallel indexing
RAM: Large memory pool (128GB+) for index caching
Storage: NVMe SSDs for fast WAL writes

For Maximum Query Performance:

CPU: High single-thread performance with many cores
RAM: Fast memory (DDR4-3200+) for index traversal
Network: High bandwidth for concurrent client connections

For Large Scale Deployments:

Distributed Setup: Multiple nodes with load balancing
Storage Tiering: Hot data in memory, warm data on SSD
Monitoring: Comprehensive metrics and alerting

🔧 Configuration

Server Configuration

# config.toml
[server]
host = "0.0.0.0"
port = 8080
grpc_port = 9090
workers = 8

[storage]
data_dir = "./data"
wal_sync_interval = "1s"
memory_map_size = "1GB"

[index]
hnsw_max_connections = 16
hnsw_ef_construction = 200
hnsw_max_layer = 16

[monitoring]
enable_metrics = true
prometheus_port = 9091
log_level = "info"

Performance Tuning

[performance]
# Optimize for insertion throughput
batch_size = 1000
insert_workers = 16

# Optimize for query latency  
query_cache_size = "500MB"
prefetch_enabled = true

# Memory management
gc_interval = "30s"
memory_limit = "8GB"

🌐 API Reference

REST API

Endpoint	Method	Description
`/collections`	POST	Create collection
`/collections/{name}`	GET	Get collection info
`/collections/{name}/vectors`	POST	Insert vectors
`/collections/{name}/search`	POST	Search vectors
`/collections/{name}/vectors/{id}`	DELETE	Delete vector
`/stats`	GET	Server statistics
`/health`	GET	Health check

gRPC Services

service VectorDb {
  rpc CreateCollection(CreateCollectionRequest) returns (CreateCollectionResponse);
  rpc Insert(InsertRequest) returns (InsertResponse);
  rpc BatchInsert(BatchInsertRequest) returns (BatchInsertResponse);
  rpc Query(QueryRequest) returns (QueryResponse);
  rpc Delete(DeleteRequest) returns (DeleteResponse);
  rpc GetStats(GetStatsRequest) returns (GetStatsResponse);
}

Client SDKs

// Rust Client
use vectordb_client::VectorDbClient;

let client = VectorDbClient::new("http://localhost:8080").await?;

// Create collection
client.create_collection("documents", 128, DistanceMetric::Cosine).await?;

// Insert vector
client.insert("documents", "doc1", vec![0.1, 0.2, 0.3], metadata).await?;

// Search
let results = client.search("documents", query_vector, 10).await?;

# Python Client (Coming Soon)
import vectordb

client = vectordb.Client("http://localhost:8080")
client.create_collection("documents", 128, "cosine")
client.insert("documents", "doc1", [0.1, 0.2, 0.3], {"title": "Example"})
results = client.search("documents", query_vector, limit=10)

🔍 Use Cases

🤖 AI & Machine Learning

Embedding storage for transformer models (BERT, GPT, etc.)
Recommendation engines with user/item similarity
Content-based filtering and personalization

🔍 Search & Discovery

Semantic search in documents and knowledge bases
Image/video similarity search and retrieval
Product recommendation in e-commerce platforms

📊 Data Analytics

Anomaly detection in high-dimensional data
Clustering and classification of complex datasets
Feature matching in computer vision applications

🏢 Enterprise Applications

Document similarity in legal and compliance systems
Fraud detection through pattern matching
Customer segmentation and behavioral analysis

🚦 Production Deployment

Docker Deployment

FROM rust:1.70 as builder
COPY . .
RUN cargo build --release

FROM debian:bookworm-slim
COPY --from=builder /target/release/vectordb-server /usr/local/bin/
EXPOSE 8080 9090 9091
CMD ["vectordb-server", "--config", "/etc/vectordb/config.toml"]

# Build and run
docker build -t vectordb-rs .
docker run -p 8080:8080 -p 9090:9090 -v ./data:/data vectordb-rs

Kubernetes Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: vectordb-rs
spec:
  replicas: 3
  selector:
    matchLabels:
      app: vectordb-rs
  template:
    metadata:
      labels:
        app: vectordb-rs
    spec:
      containers:
      - name: vectordb-rs
        image: vectordb-rs:latest
        ports:
        - containerPort: 8080
        - containerPort: 9090
        resources:
          requests:
            memory: "2Gi"
            cpu: "500m"
          limits:
            memory: "8Gi"
            cpu: "4"

Monitoring Integration

# Prometheus configuration
- job_name: 'vectordb-rs'
  static_configs:
  - targets: ['vectordb-rs:9091']
  scrape_interval: 15s
  metrics_path: /metrics

📈 Performance Comparison

vs. Traditional Vector Databases

Feature	VectorDB-RS	Pinecone	Weaviate	Qdrant
Language	Rust	Python/C++	Go	Rust
Memory Safety	✅ Zero-cost	❌ Manual	❌ GC Overhead	✅ Zero-cost
Concurrency	✅ Native	⚠️ Limited	⚠️ GC Pauses	✅ Native
Deployment	✅ Single Binary	❌ Cloud Only	⚠️ Complex	✅ Flexible
Performance	✅ 35M ops/sec	⚠️ Network Bound	⚠️ GC Limited	✅ Comparable

Scaling Characteristics

Dataset Size	Query Latency	Memory Usage	Throughput
1K vectors	<100µs	<10MB	50K+ qps
100K vectors	<500µs	<500MB	25K+ qps
1M vectors	<2ms	<2GB	15K+ qps
10M vectors	<10ms	<8GB	8K+ qps

🤝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

Development Workflow

# Fork and clone the repository
git clone https://github.com/your-username/d-vecDB.git
cd d-vecDB

# Create a feature branch
git checkout -b feature/amazing-feature

# Make changes and test
cargo test
cargo clippy
cargo fmt

# Submit a pull request
git push origin feature/amazing-feature

Areas for Contribution

🚀 Performance optimizations and SIMD implementations
🌐 Additional client SDK languages (Python, JavaScript, Java)
📊 Advanced indexing algorithms (IVF, PQ, LSH)
🔧 Operational tools and monitoring dashboards
📚 Documentation and example applications

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🆘 Support

📧 Email: support@vectordb-rs.com
💬 Discord: VectorDB-RS Community
🐛 Issues: GitHub Issues
📚 Documentation: docs.vectordb-rs.com

🙏 Acknowledgments

Built with ❤️ in Rust
Inspired by modern vector database architectures
Powered by the amazing Rust ecosystem
Community-driven development

⚡ Ready to build the future of AI-powered applications? Get started with VectorDB-RS today!

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.2

Oct 29, 2025

0.2.1

Oct 28, 2025

0.2.0

Oct 28, 2025

0.1.1

Sep 2, 2025

This version

0.1.0

Sep 2, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

d_vecdb-0.1.0.tar.gz (66.6 kB view details)

Uploaded Sep 2, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

d_vecdb-0.1.0-py3-none-any.whl (67.1 kB view details)

Uploaded Sep 2, 2025 Python 3

File details

Details for the file d_vecdb-0.1.0.tar.gz.

File metadata

Download URL: d_vecdb-0.1.0.tar.gz
Upload date: Sep 2, 2025
Size: 66.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for d_vecdb-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`2387f223a4774616778d11d1c18d7c27d66a675741a0731f72f2558535e50c6b`
MD5	`e264da4f8e340b32a60069e0bf554fd7`
BLAKE2b-256	`3a7259fde3a0c41cdcc370e1b92a87cdcb82218b8b2e84a044b1567c49c8e717`

See more details on using hashes here.

File details

Details for the file d_vecdb-0.1.0-py3-none-any.whl.

File metadata

Download URL: d_vecdb-0.1.0-py3-none-any.whl
Upload date: Sep 2, 2025
Size: 67.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for d_vecdb-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`54d3588312133d4fe2efe4323ec28c9f37d5659c3c95d310202526e64d07dbe7`
MD5	`a7f0275292ce65364de935d9cd9077e8`
BLAKE2b-256	`39b2172a52db9564f8149f74d5a70a9b76ef72c1e32f02fa41467e5b9201f162`

See more details on using hashes here.

d-vecdb 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

🚀 VectorDB-RS

🎯 Key Features

⚡ Ultra-High Performance

🏗️ Production Architecture

🔧 Developer Experience

📊 Enterprise Ready

📈 Benchmark Results

Distance Calculations

HNSW Index Operations

Performance Projections on Higher-End Hardware

High-End Server (32-core AMD EPYC, 128GB RAM, NVMe)

Optimized Cloud Instance (16-core, 64GB RAM, SSD)

🏗️ Architecture

Core Components

🚀 Quick Start

Installation

Start the Server

Basic Usage

🛠️ Development Setup

Prerequisites

Build Instructions

Project Structure

📚 Client Libraries

🐍 Python Client

🦀 Rust Client (Native)

🌐 HTTP/REST API

🚧 Coming Soon

📊 Comprehensive Benchmarking

Running Benchmarks

Benchmark Categories

Hardware Optimization Guide

For Maximum Insertion Throughput:

For Maximum Query Performance:

For Large Scale Deployments:

🔧 Configuration

Server Configuration

Performance Tuning

🌐 API Reference

REST API

gRPC Services

Client SDKs

🔍 Use Cases

🤖 AI & Machine Learning

🔍 Search & Discovery

📊 Data Analytics

🏢 Enterprise Applications

🚦 Production Deployment

Docker Deployment

Kubernetes Deployment

Monitoring Integration

📈 Performance Comparison

vs. Traditional Vector Databases

Scaling Characteristics

🤝 Contributing

Development Workflow

Areas for Contribution

📄 License

🆘 Support

🙏 Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details