Skip to main content

A high-performance vector database for AI applications

Project description

VelesDB Python

PyPI Python License

Python bindings for VelesDB - a high-performance vector database for AI applications.

Features

  • Vector Similarity Search: HNSW index with SIMD-optimized distance calculations
  • Multiple Distance Metrics: Cosine, Euclidean, Dot Product, Hamming, Jaccard
  • Persistent Storage: Memory-mapped files for efficient disk I/O
  • Metadata Support: Store and retrieve JSON payloads with vectors
  • NumPy Integration: Native support for NumPy arrays

Installation

pip install velesdb

Quick Start

import velesdb

# Open or create a database
db = velesdb.Database("./my_vectors")

# Create a collection for 768-dimensional vectors (e.g., BERT embeddings)
collection = db.create_collection(
    name="documents",
    dimension=768,
    metric="cosine"  # Options: "cosine", "euclidean", "dot"
)

# Insert vectors with metadata
collection.upsert([
    {
        "id": 1,
        "vector": [0.1, 0.2, ...],  # 768-dim vector
        "payload": {"title": "Introduction to AI", "category": "tech"}
    },
    {
        "id": 2,
        "vector": [0.3, 0.4, ...],
        "payload": {"title": "Machine Learning Basics", "category": "tech"}
    }
])

# Search for similar vectors
results = collection.search(
    vector=[0.15, 0.25, ...],  # Query vector
    top_k=5
)

for result in results:
    print(f"ID: {result['id']}, Score: {result['score']:.4f}")
    print(f"Payload: {result['payload']}")

API Reference

Database

# Create/open database
db = velesdb.Database("./path/to/data")

# List collections
names = db.list_collections()

# Create collection
collection = db.create_collection("name", dimension=768, metric="cosine")

# Get existing collection
collection = db.get_collection("name")

# Delete collection
db.delete_collection("name")

Collection

# Get collection info
info = collection.info()
# {"name": "documents", "dimension": 768, "metric": "cosine", "point_count": 100}

# Insert/update vectors (with immediate flush)
collection.upsert([
    {"id": 1, "vector": [...], "payload": {"key": "value"}}
])

# Bulk insert (optimized for high-throughput - 3-7x faster)
# Uses parallel HNSW insertion + single flush at the end
collection.upsert_bulk([
    {"id": i, "vector": vectors[i].tolist()} for i in range(10000)
])

# Search
results = collection.search(vector=[...], top_k=10)

# Get specific points
points = collection.get([1, 2, 3])

# Delete points
collection.delete([1, 2, 3])

# Check if empty
is_empty = collection.is_empty()

# Flush to disk
collection.flush()

Bulk Loading Performance

For large-scale data import, use upsert_bulk() instead of upsert():

Method 10k vectors (768D) Notes
upsert() ~47s Flushes after each batch
upsert_bulk() ~3s Single flush + parallel HNSW
# Recommended for bulk import
import numpy as np

vectors = np.random.rand(10000, 768).astype('float32')
points = [{"id": i, "vector": v.tolist()} for i, v in enumerate(vectors)]

collection.upsert_bulk(points)  # 7x faster than upsert()

Distance Metrics

Metric Description Use Case
cosine Cosine similarity (default) Text embeddings, normalized vectors
euclidean Euclidean (L2) distance Image features, spatial data
dot Dot product When vectors are pre-normalized
hamming Hamming distance Binary vectors, fingerprints, hashes
jaccard Jaccard similarity Set similarity, tags, recommendations

Performance

VelesDB is built in Rust with explicit SIMD optimizations:

Operation Time (768d) Throughput
Cosine ~76 ns 13M ops/sec
Euclidean ~47 ns 21M ops/sec
Hamming ~6 ns 164M ops/sec

Benchmark: VelesDB vs pgvector (HNSW)

Tested on clustered embeddings (768D) — realistic AI workloads:

Dataset Size VelesDB Recall VelesDB P50 pgvector P50 Speedup
1,000 100.0% 0.5ms 50ms 100x
10,000 99.0% 2.5ms 50ms 20x
100,000 97.8% 4.3ms 50ms 12x
  • 12-100x faster than pgvector depending on dataset size
  • 97-100% recall across all scales
  • Sub-5ms latency even at 100k vectors

Requirements

  • Python 3.9+
  • No external dependencies (pure Rust engine)
  • Optional: NumPy for array support

License

Elastic License 2.0 (ELv2)

See LICENSE for details.

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

velesdb-0.5.0.tar.gz (170.9 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

velesdb-0.5.0-cp311-none-win_amd64.whl (1.3 MB view details)

Uploaded CPython 3.11Windows x86-64

velesdb-0.5.0-cp311-cp311-manylinux_2_34_x86_64.whl (1.5 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.34+ x86-64

File details

Details for the file velesdb-0.5.0.tar.gz.

File metadata

  • Download URL: velesdb-0.5.0.tar.gz
  • Upload date:
  • Size: 170.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for velesdb-0.5.0.tar.gz
Algorithm Hash digest
SHA256 8db758d54751ddeba6b0b8c3179be0ea9703df113f05aeb6f9b980d28b01ac76
MD5 e9a434154e4b66ac3b24e7af39bf507e
BLAKE2b-256 2662062c00622c3b42952d2d80805edbec0738b8aa55c6bcbb39c79dd7016bdd

See more details on using hashes here.

Provenance

The following attestation bundles were made for velesdb-0.5.0.tar.gz:

Publisher: pypi-publish.yml on cyberlife-coder/VelesDB

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file velesdb-0.5.0-cp311-none-win_amd64.whl.

File metadata

  • Download URL: velesdb-0.5.0-cp311-none-win_amd64.whl
  • Upload date:
  • Size: 1.3 MB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for velesdb-0.5.0-cp311-none-win_amd64.whl
Algorithm Hash digest
SHA256 09e1bf6d73b7fe1b8c90fb87e673ef7cca3ee97b6d0ba84af8ab9b23bd8cecf0
MD5 110ad08af7eee5f5032a681ecc20052d
BLAKE2b-256 162a1197c1c3a525eaf5a70c727781b02cd49c8ec8a14c82eefdcacca70da355

See more details on using hashes here.

Provenance

The following attestation bundles were made for velesdb-0.5.0-cp311-none-win_amd64.whl:

Publisher: pypi-publish.yml on cyberlife-coder/VelesDB

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file velesdb-0.5.0-cp311-cp311-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for velesdb-0.5.0-cp311-cp311-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 4636c828b50d1839c3ff14565e676cee02fd02e5a53c68cf2b68bfe817141ee7
MD5 c328de588f5e1a71ca67a98456f5198d
BLAKE2b-256 c98ceb6b821cbe6a126ebc80faadc5cad2e25cd0fbc1012166e1744f951275e4

See more details on using hashes here.

Provenance

The following attestation bundles were made for velesdb-0.5.0-cp311-cp311-manylinux_2_34_x86_64.whl:

Publisher: pypi-publish.yml on cyberlife-coder/VelesDB

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page