Skip to main content

VelesDB: The Local Vector Database for Python & AI. Microsecond semantic search, zero network latency.

Project description

VelesDB Python

PyPI Python License

Python bindings for VelesDB - a high-performance vector database for AI applications.

Features

  • Vector Similarity Search: HNSW index with SIMD-optimized distance calculations
  • Multi-Query Fusion: Native MQG support with RRF/Weighted fusion strategies
  • Multiple Distance Metrics: Cosine, Euclidean, Dot Product, Hamming, Jaccard
  • Persistent Storage: Memory-mapped files for efficient disk I/O
  • Metadata Support: Store and retrieve JSON payloads with vectors
  • NumPy Integration: Native support for NumPy arrays
  • Type Hints: Full .pyi stub file for IDE autocompletion

Installation

pip install velesdb

Quick Start

import velesdb

# Open or create a database
db = velesdb.Database("./my_vectors")

# Create a collection for 768-dimensional vectors (e.g., BERT embeddings)
collection = db.create_collection(
    name="documents",
    dimension=768,
    metric="cosine"  # Options: "cosine", "euclidean", "dot", "hamming", "jaccard"
)

# Insert vectors with metadata
collection.upsert([
    {
        "id": 1,
        "vector": [0.1, 0.2, ...],  # 768-dim vector
        "payload": {"title": "Introduction to AI", "category": "tech"}
    },
    {
        "id": 2,
        "vector": [0.3, 0.4, ...],
        "payload": {"title": "Machine Learning Basics", "category": "tech"}
    }
])

# Search for similar vectors
results = collection.search(
    vector=[0.15, 0.25, ...],  # Query vector
    top_k=5
)

for result in results:
    print(f"ID: {result['id']}, Score: {result['score']:.4f}")
    print(f"Payload: {result['payload']}")

API Reference

Database

# Create/open database
db = velesdb.Database("./path/to/data")

# List collections
names = db.list_collections()

# Create collection
collection = db.create_collection("name", dimension=768, metric="cosine")

# Get existing collection
collection = db.get_collection("name")

# Delete collection
db.delete_collection("name")

Collection

# Get collection info
info = collection.info()
# {"name": "documents", "dimension": 768, "metric": "cosine", "storage_mode": "full", "point_count": 100}

# Insert/update vectors (with immediate flush)
collection.upsert([
    {"id": 1, "vector": [...], "payload": {"key": "value"}}
])

# Bulk insert (optimized for high-throughput - 3-7x faster)
# Uses parallel HNSW insertion + single flush at the end
collection.upsert_bulk([
    {"id": i, "vector": vectors[i].tolist()} for i in range(10000)
])

# Vector search
results = collection.search(vector=[...], top_k=10)

# Batch search (multiple queries in parallel)
batch_results = collection.batch_search(
    vectors=[[0.1, 0.2, ...], [0.3, 0.4, ...]],
    top_k=5
)

# Multi-query fusion search (MQG pipelines)
from velesdb import FusionStrategy

results = collection.multi_query_search(
    vectors=[query1, query2, query3],  # Multiple reformulations
    top_k=10,
    fusion=FusionStrategy.rrf(k=60)  # RRF, average, maximum, or weighted
)

# Weighted fusion (like SearchXP scoring)
results = collection.multi_query_search(
    vectors=[v1, v2, v3],
    top_k=10,
    fusion=FusionStrategy.weighted(
        avg_weight=0.6,
        max_weight=0.3,
        hit_weight=0.1
    )
)

# Text search (BM25)
results = collection.text_search(query="machine learning", top_k=10)

# Hybrid search (vector + text with RRF fusion)
results = collection.hybrid_search(
    vector=[0.1, 0.2, ...],
    query="machine learning",
    top_k=10,
    vector_weight=0.7  # 0.0 = text only, 1.0 = vector only
)

# Get specific points
points = collection.get([1, 2, 3])

# Delete points
collection.delete([1, 2, 3])

# Check if empty
is_empty = collection.is_empty()

# Flush to disk
collection.flush()

# VelesQL query
results = collection.query(
    "SELECT * FROM vectors WHERE category = 'tech' LIMIT 10"
)

# VelesQL with parameters
results = collection.query(
    "SELECT * FROM vectors WHERE VECTOR NEAR $query LIMIT 5",
    params={"query": [0.1, 0.2, ...]}
)

# Search with metadata filter
results = collection.search_with_filter(
    vector=[0.1, 0.2, ...],
    top_k=10,
    filter={"condition": {"type": "eq", "field": "category", "value": "tech"}}
)

Storage Modes (Quantization)

Reduce memory usage with vector quantization:

# Full precision (default) - 4 bytes per dimension
collection = db.create_collection("full", dimension=768, storage_mode="full")

# SQ8 quantization - 1 byte per dimension (4x compression)
collection = db.create_collection("sq8", dimension=768, storage_mode="sq8")

# Binary quantization - 1 bit per dimension (32x compression)
collection = db.create_collection("binary", dimension=768, storage_mode="binary")
Mode Memory per Vector (768D) Compression Best For
full 3,072 bytes 1x Maximum accuracy
sq8 768 bytes 4x Good accuracy/memory balance
binary 96 bytes 32x Edge/IoT, massive scale

Bulk Loading Performance

For large-scale data import, use upsert_bulk() instead of upsert():

Method 10k vectors (768D) Notes
upsert() ~47s Flushes after each batch
upsert_bulk() ~3s Single flush + parallel HNSW
# Recommended for bulk import
import numpy as np

vectors = np.random.rand(10000, 768).astype('float32')
points = [{"id": i, "vector": v.tolist()} for i, v in enumerate(vectors)]

collection.upsert_bulk(points)  # 7x faster than upsert()

Distance Metrics

Metric Description Use Case
cosine Cosine similarity (default) Text embeddings, normalized vectors
euclidean Euclidean (L2) distance Image features, spatial data
dot Dot product When vectors are pre-normalized
hamming Hamming distance Binary vectors, fingerprints, hashes
jaccard Jaccard similarity Set similarity, tags, recommendations

Performance

VelesDB is built in Rust with explicit SIMD optimizations:

Operation Time (768d) Throughput
Cosine ~93 ns 11M ops/sec
Euclidean ~46 ns 22M ops/sec
Dot Product ~36 ns 28M ops/sec
Hamming ~6 ns 164M ops/sec

Benchmark: VelesDB vs pgvector (HNSW)

Tested on clustered embeddings (768D) — realistic AI workloads:

Dataset Size VelesDB Recall VelesDB P50 pgvector P50 Speedup
1,000 100.0% 0.5ms 50ms 100x
10,000 99.0% 2.5ms 50ms 20x
100,000 97.8% 4.3ms 50ms 12x
  • 12-100x faster than pgvector depending on dataset size
  • 97-100% recall across all scales
  • Sub-5ms latency even at 100k vectors

Requirements

  • Python 3.9+
  • No external dependencies (pure Rust engine)
  • Optional: NumPy for array support

License

Elastic License 2.0 (ELv2)

See LICENSE for details.

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

velesdb-1.5.1-cp311-cp311-win_amd64.whl (1.3 MB view details)

Uploaded CPython 3.11Windows x86-64

velesdb-1.5.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.4 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

velesdb-1.5.1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.2 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ ARM64

velesdb-1.5.1-cp311-cp311-macosx_11_0_arm64.whl (1.2 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

velesdb-1.5.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.4 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

File details

Details for the file velesdb-1.5.1-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: velesdb-1.5.1-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 1.3 MB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for velesdb-1.5.1-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 61e2f26aa1ea35ab3209d30c3e41c676d0d288a5cab73ac62d3c798d6c12b2fe
MD5 a05857332f66cd9af95a912b3e1535a3
BLAKE2b-256 150cfa5ffff937df27deb76cd84ed4a8c42b6d6949b26fea911d9cba4a695a4b

See more details on using hashes here.

File details

Details for the file velesdb-1.5.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for velesdb-1.5.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 cffd1b464c54106db0e4cf9d633c36d7c022b37e2b02b606f7bba6401467e583
MD5 201fe57c8904606df1a9f13d7cf536b0
BLAKE2b-256 4a0af0391fd37ab619c5acaa7aedd92fa47418f71be879176c395a74a1afe444

See more details on using hashes here.

File details

Details for the file velesdb-1.5.1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for velesdb-1.5.1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 e71101094085a44ef5832f4714e1a1dd9af1b51cdf94a448bcd71757e7b43bf1
MD5 99d9dc3e0d7971d3d7444365b15f30c4
BLAKE2b-256 40fd3d98228521521e3404ed99b7569f97499d8c591a0278682377586fc32395

See more details on using hashes here.

File details

Details for the file velesdb-1.5.1-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for velesdb-1.5.1-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 8d701d826bf0fd67c8c156e8e78b922153bb5108e772f4471ba04240819a5fef
MD5 585b405be8b824978792cbcb95d21974
BLAKE2b-256 941ab4f1af31a9da77b3808edf2b5207b43946705fffda65c153bb1dd0d248d7

See more details on using hashes here.

File details

Details for the file velesdb-1.5.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for velesdb-1.5.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 0a5c76ba3002096305951f9e265b3f63182cdee376ce46d548202c9a231fb3be
MD5 1d0ebf44bcff26d0740c29ca1ac6cdc2
BLAKE2b-256 29ffa5dc37fdfd0c015ba3651da86f0aecea7afa2e5713e08383f884b9dfa19e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page