Skip to main content

High-performance vector database written in Rust with Python client

Project description

๐Ÿš€ VectorDB-RS

Build Status Rust Version License: MIT Docker Crates.io Downloads Documentation Coverage Security Audit Performance

A high-performance, production-ready vector database written in Rust

VectorDB-RS is a modern vector database designed for AI applications, semantic search, and similarity matching. Built from the ground up in Rust, it delivers exceptional performance, memory safety, and concurrent processing capabilities.


๐ŸŽฏ Key Features

โšก Ultra-High Performance

  • Sub-microsecond vector operations (28-76ns per distance calculation)
  • HNSW indexing with O(log N) search complexity
  • Concurrent processing with Rust's fearless concurrency
  • Memory-mapped storage for efficient large dataset handling

๐Ÿ—๏ธ Production Architecture

  • gRPC & REST APIs for universal client compatibility
  • Write-Ahead Logging (WAL) for ACID durability and crash recovery
  • Multi-threaded indexing and query processing
  • Comprehensive error handling and observability

๐Ÿ”ง Developer Experience

  • Type-safe APIs with Protocol Buffers
  • Rich metadata support with JSON field storage
  • Comprehensive benchmarking suite with HTML reports
  • CLI tools for database management

๐Ÿ“Š Enterprise Ready

  • Horizontal scaling capabilities
  • Monitoring integration with Prometheus metrics
  • Flexible deployment (standalone, containerized, embedded)
  • Cross-platform support (Linux, macOS, Windows)

๐Ÿ“ˆ Benchmark Results

Tested on macOS Darwin 24.6.0 with optimized release builds

Distance Calculations

Operation Latency Throughput
Dot Product 28.3 ns 35.4M ops/sec
Euclidean Distance 30.6 ns 32.7M ops/sec
Cosine Similarity 76.1 ns 13.1M ops/sec

HNSW Index Operations

Operation Performance Scale
Vector Insertion 7,108 vectors/sec 1,000 vectors benchmark
Vector Search 13,150 queries/sec 5,000 vector dataset
With Metadata 2,560 inserts/sec Rich JSON metadata

Performance Projections on Higher-End Hardware

Based on our benchmark results, here are conservative performance extrapolations for production hardware:

High-End Server (32-core AMD EPYC, 128GB RAM, NVMe)

Operation Current (Mac) Projected (Server) Improvement
Distance Calculations 35M ops/sec 150M+ ops/sec 4.3x
Vector Insertion 7K vectors/sec 50K+ vectors/sec 7x
Vector Search 13K queries/sec 100K+ queries/sec 7.7x
Concurrent Queries Single-threaded 500K+ queries/sec 38x

Optimized Cloud Instance (16-core, 64GB RAM, SSD)

Operation Current (Mac) Projected (Cloud) Improvement
Distance Calculations 35M ops/sec 80M+ ops/sec 2.3x
Vector Insertion 7K vectors/sec 25K+ vectors/sec 3.6x
Vector Search 13K queries/sec 45K+ queries/sec 3.5x
Concurrent Queries Single-threaded 180K+ queries/sec 14x

Projections based on CPU core scaling, memory bandwidth improvements, and storage I/O optimizations


๐Ÿ—๏ธ Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                  ๐ŸŽฏ VectorDB-RS Stack                       โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  CLI Tool      โ”‚  Client SDKs   โ”‚  REST + gRPC APIs          โ”‚
โ”‚  (Management)  โ”‚  (Rust/Python) โ”‚  (Universal Access)        โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                    Vector Store Engine                      โ”‚
โ”‚              (Indexing + Storage + Querying)                โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  HNSW Index    โ”‚   WAL Storage   โ”‚   Memory Mapping          โ”‚
โ”‚  (O(log N))    โ”‚   (Durability)  โ”‚   (Performance)           โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Core Components

  • ๐Ÿ” HNSW Index: Hierarchical Navigable Small World graphs for approximate nearest neighbor search
  • ๐Ÿ’พ Storage Engine: Memory-mapped files with write-ahead logging for durability
  • ๐ŸŒ API Layer: Both REST (HTTP/JSON) and gRPC (Protocol Buffers) interfaces
  • ๐Ÿ“Š Monitoring: Built-in Prometheus metrics and comprehensive logging
  • ๐Ÿ”ง CLI Tools: Database management, collection operations, and administrative tasks

๐Ÿš€ Quick Start

Installation

Option 1: Install from PyPI (Recommended)

# Install d-vecDB with Python client
pip install d-vecdb

# Or install with development extras
pip install d-vecdb[dev,docs,examples]

Option 2: Install from Source

# Clone the repository
git clone https://github.com/rdmurugan/d-vecDB.git
cd d-vecDB

# Quick install using script
./scripts/install.sh

# Or manual installation
pip install .

Option 3: For Development

# Clone and setup development environment
git clone https://github.com/rdmurugan/d-vecDB.git
cd d-vecDB

# Install in development mode with all extras
./scripts/install.sh dev

# Build Rust server components
./scripts/build-server.sh

Option 4: Using Virtual Environment

# Create isolated environment
./scripts/install.sh venv
source venv/bin/activate  # Linux/macOS
# venv\Scripts\activate   # Windows

Start the Server

# Start with default configuration
./target/release/vectordb-server --config config.toml

# Or with custom settings
./target/release/vectordb-server \
  --host 0.0.0.0 \
  --port 8080 \
  --data-dir /path/to/data \
  --log-level info

Basic Usage

# Create a collection
curl -X POST http://localhost:8080/collections \
  -H "Content-Type: application/json" \
  -d '{
    "name": "documents",
    "dimension": 128,
    "distance_metric": "cosine"
  }'

# Insert vectors
curl -X POST http://localhost:8080/collections/documents/vectors \
  -H "Content-Type: application/json" \
  -d '{
    "id": "doc1",
    "data": [0.1, 0.2, 0.3, ...],
    "metadata": {"title": "Example Document"}
  }'

# Search for similar vectors
curl -X POST http://localhost:8080/collections/documents/search \
  -H "Content-Type: application/json" \
  -d '{
    "query_vector": [0.1, 0.2, 0.3, ...],
    "limit": 10
  }'

๐Ÿ› ๏ธ Development Setup

Prerequisites

  • Rust 1.70+ (Install Rust)
  • Protocol Buffers compiler (protoc)
  • Git for version control

Build Instructions

# Development build
cargo build

# Optimized release build  
cargo build --release

# Run all tests
cargo test

# Run benchmarks
cargo bench --package vectordb-common

# Generate documentation
cargo doc --open

Project Structure

d-vecDB/
โ”œโ”€โ”€ common/          # Core types, distance functions, utilities
โ”œโ”€โ”€ index/           # HNSW indexing implementation
โ”œโ”€โ”€ storage/         # WAL, memory-mapping, persistence
โ”œโ”€โ”€ vectorstore/     # Main vector store engine
โ”œโ”€โ”€ server/          # REST & gRPC API servers
โ”œโ”€โ”€ python-client/   # ๐Ÿ Official Python client library
โ”œโ”€โ”€ client/          # Additional client SDKs and libraries
โ”œโ”€โ”€ cli/             # Command-line tools
โ”œโ”€โ”€ proto/           # Protocol Buffer definitions
โ””โ”€โ”€ benchmarks/      # Performance testing suite

๐Ÿ“š Client Libraries

VectorDB-RS provides official client libraries for multiple programming languages:

๐Ÿ Python Client

PyPI version PyPI downloads Python 3.8+

Full-featured Python client with async support, NumPy integration, and type safety.

  • ๐Ÿ”„ Sync & Async: Both synchronous and asynchronous clients
  • โšก High Performance: Concurrent batch operations (1000+ vectors/sec)
  • ๐Ÿงฎ NumPy Native: Direct NumPy array support
  • ๐Ÿ”’ Type Safe: Pydantic models with validation
  • ๐ŸŒ Multi-Protocol: REST and gRPC support
# Install from PyPI
pip install vectordb-client

# Quick usage
from vectordb_client import VectorDBClient
import numpy as np

client = VectorDBClient()
client.create_collection_simple("docs", 384, "cosine")
client.insert_simple("docs", "doc_1", np.random.random(384))
results = client.search_simple("docs", np.random.random(384), limit=5)

๐Ÿ“– Complete Python Documentation โ†’

๐Ÿฆ€ Rust Client (Native)

Crates.io

Direct access to the native Rust API for maximum performance.

๐ŸŒ HTTP/REST API

Language-agnostic REST API with OpenAPI specification.

๐Ÿ“– API Documentation โ†’

๐Ÿšง Coming Soon

  • JavaScript/TypeScript client
  • Go client
  • Java client
  • C++ bindings

๐Ÿ“Š Comprehensive Benchmarking

Running Benchmarks

# Core performance benchmarks
cargo bench --package vectordb-common

# Generate HTML reports
cargo bench --package vectordb-common
open target/criterion/report/index.html

# Custom benchmark suite
./scripts/run-comprehensive-benchmarks.sh

Benchmark Categories

  1. ๐Ÿงฎ Distance Calculations: Core mathematical operations (cosine, euclidean, dot product)
  2. ๐Ÿ—‚๏ธ Index Operations: Vector insertion, search, and maintenance
  3. ๐Ÿ’พ Storage Performance: WAL writes, memory-mapped reads, persistence
  4. ๐ŸŒ API Throughput: REST and gRPC endpoint performance
  5. ๐Ÿ“ˆ Scaling Tests: Performance under load with varying dataset sizes

Hardware Optimization Guide

For Maximum Insertion Throughput:

  • CPU: High core count (32+ cores) for parallel indexing
  • RAM: Large memory pool (128GB+) for index caching
  • Storage: NVMe SSDs for fast WAL writes

For Maximum Query Performance:

  • CPU: High single-thread performance with many cores
  • RAM: Fast memory (DDR4-3200+) for index traversal
  • Network: High bandwidth for concurrent client connections

For Large Scale Deployments:

  • Distributed Setup: Multiple nodes with load balancing
  • Storage Tiering: Hot data in memory, warm data on SSD
  • Monitoring: Comprehensive metrics and alerting

๐Ÿ”ง Configuration

Server Configuration

# config.toml
[server]
host = "0.0.0.0"
port = 8080
grpc_port = 9090
workers = 8

[storage]
data_dir = "./data"
wal_sync_interval = "1s"
memory_map_size = "1GB"

[index]
hnsw_max_connections = 16
hnsw_ef_construction = 200
hnsw_max_layer = 16

[monitoring]
enable_metrics = true
prometheus_port = 9091
log_level = "info"

Performance Tuning

[performance]
# Optimize for insertion throughput
batch_size = 1000
insert_workers = 16

# Optimize for query latency  
query_cache_size = "500MB"
prefetch_enabled = true

# Memory management
gc_interval = "30s"
memory_limit = "8GB"

๐ŸŒ API Reference

REST API

Endpoint Method Description
/collections POST Create collection
/collections/{name} GET Get collection info
/collections/{name}/vectors POST Insert vectors
/collections/{name}/search POST Search vectors
/collections/{name}/vectors/{id} DELETE Delete vector
/stats GET Server statistics
/health GET Health check

gRPC Services

service VectorDb {
  rpc CreateCollection(CreateCollectionRequest) returns (CreateCollectionResponse);
  rpc Insert(InsertRequest) returns (InsertResponse);
  rpc BatchInsert(BatchInsertRequest) returns (BatchInsertResponse);
  rpc Query(QueryRequest) returns (QueryResponse);
  rpc Delete(DeleteRequest) returns (DeleteResponse);
  rpc GetStats(GetStatsRequest) returns (GetStatsResponse);
}

Client SDKs

// Rust Client
use vectordb_client::VectorDbClient;

let client = VectorDbClient::new("http://localhost:8080").await?;

// Create collection
client.create_collection("documents", 128, DistanceMetric::Cosine).await?;

// Insert vector
client.insert("documents", "doc1", vec![0.1, 0.2, 0.3], metadata).await?;

// Search
let results = client.search("documents", query_vector, 10).await?;
# Python Client (Coming Soon)
import vectordb

client = vectordb.Client("http://localhost:8080")
client.create_collection("documents", 128, "cosine")
client.insert("documents", "doc1", [0.1, 0.2, 0.3], {"title": "Example"})
results = client.search("documents", query_vector, limit=10)

๐Ÿ” Use Cases

๐Ÿค– AI & Machine Learning

  • Embedding storage for transformer models (BERT, GPT, etc.)
  • Recommendation engines with user/item similarity
  • Content-based filtering and personalization

๐Ÿ” Search & Discovery

  • Semantic search in documents and knowledge bases
  • Image/video similarity search and retrieval
  • Product recommendation in e-commerce platforms

๐Ÿ“Š Data Analytics

  • Anomaly detection in high-dimensional data
  • Clustering and classification of complex datasets
  • Feature matching in computer vision applications

๐Ÿข Enterprise Applications

  • Document similarity in legal and compliance systems
  • Fraud detection through pattern matching
  • Customer segmentation and behavioral analysis

๐Ÿšฆ Production Deployment

Docker Deployment

FROM rust:1.70 as builder
COPY . .
RUN cargo build --release

FROM debian:bookworm-slim
COPY --from=builder /target/release/vectordb-server /usr/local/bin/
EXPOSE 8080 9090 9091
CMD ["vectordb-server", "--config", "/etc/vectordb/config.toml"]
# Build and run
docker build -t vectordb-rs .
docker run -p 8080:8080 -p 9090:9090 -v ./data:/data vectordb-rs

Kubernetes Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: vectordb-rs
spec:
  replicas: 3
  selector:
    matchLabels:
      app: vectordb-rs
  template:
    metadata:
      labels:
        app: vectordb-rs
    spec:
      containers:
      - name: vectordb-rs
        image: vectordb-rs:latest
        ports:
        - containerPort: 8080
        - containerPort: 9090
        resources:
          requests:
            memory: "2Gi"
            cpu: "500m"
          limits:
            memory: "8Gi"
            cpu: "4"

Monitoring Integration

# Prometheus configuration
- job_name: 'vectordb-rs'
  static_configs:
  - targets: ['vectordb-rs:9091']
  scrape_interval: 15s
  metrics_path: /metrics

๐Ÿ“ˆ Performance Comparison

vs. Traditional Vector Databases

Feature VectorDB-RS Pinecone Weaviate Qdrant
Language Rust Python/C++ Go Rust
Memory Safety โœ… Zero-cost โŒ Manual โŒ GC Overhead โœ… Zero-cost
Concurrency โœ… Native โš ๏ธ Limited โš ๏ธ GC Pauses โœ… Native
Deployment โœ… Single Binary โŒ Cloud Only โš ๏ธ Complex โœ… Flexible
Performance โœ… 35M ops/sec โš ๏ธ Network Bound โš ๏ธ GC Limited โœ… Comparable

Scaling Characteristics

Dataset Size Query Latency Memory Usage Throughput
1K vectors <100ยตs <10MB 50K+ qps
100K vectors <500ยตs <500MB 25K+ qps
1M vectors <2ms <2GB 15K+ qps
10M vectors <10ms <8GB 8K+ qps

๐Ÿค Contributing

We welcome contributions! Please see our Contributing Guide for details.

Development Workflow

# Fork and clone the repository
git clone https://github.com/your-username/d-vecDB.git
cd d-vecDB

# Create a feature branch
git checkout -b feature/amazing-feature

# Make changes and test
cargo test
cargo clippy
cargo fmt

# Submit a pull request
git push origin feature/amazing-feature

Areas for Contribution

  • ๐Ÿš€ Performance optimizations and SIMD implementations
  • ๐ŸŒ Additional client SDK languages (Python, JavaScript, Java)
  • ๐Ÿ“Š Advanced indexing algorithms (IVF, PQ, LSH)
  • ๐Ÿ”ง Operational tools and monitoring dashboards
  • ๐Ÿ“š Documentation and example applications

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


๐Ÿ†˜ Support


๐Ÿ™ Acknowledgments

  • Built with โค๏ธ in Rust
  • Inspired by modern vector database architectures
  • Powered by the amazing Rust ecosystem
  • Community-driven development

โšก Ready to build the future of AI-powered applications? Get started with VectorDB-RS today!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

d_vecdb-0.1.0.tar.gz (66.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

d_vecdb-0.1.0-py3-none-any.whl (67.1 kB view details)

Uploaded Python 3

File details

Details for the file d_vecdb-0.1.0.tar.gz.

File metadata

  • Download URL: d_vecdb-0.1.0.tar.gz
  • Upload date:
  • Size: 66.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for d_vecdb-0.1.0.tar.gz
Algorithm Hash digest
SHA256 2387f223a4774616778d11d1c18d7c27d66a675741a0731f72f2558535e50c6b
MD5 e264da4f8e340b32a60069e0bf554fd7
BLAKE2b-256 3a7259fde3a0c41cdcc370e1b92a87cdcb82218b8b2e84a044b1567c49c8e717

See more details on using hashes here.

File details

Details for the file d_vecdb-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: d_vecdb-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 67.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for d_vecdb-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 54d3588312133d4fe2efe4323ec28c9f37d5659c3c95d310202526e64d07dbe7
MD5 a7f0275292ce65364de935d9cd9077e8
BLAKE2b-256 39b2172a52db9564f8149f74d5a70a9b76ef72c1e32f02fa41467e5b9201f162

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page