High-performance vector database with support for various indexing algorithms

These details have not been verified by PyPI

Project links

Project description

NusterDB - High-Performance Vector Database

NusterDB is a high-performance vector database built with Rust, designed for similarity search and nearest neighbor queries. It supports multiple indexing algorithms and distance metrics, making it suitable for machine learning applications, recommendation systems, and any use case requiring efficient vector operations.

Features

Multiple Index Types: Support for Flat, HNSW, IVF, LSH, and Annoy indices
Distance Metrics: Euclidean, Cosine, Manhattan, Angular, Jaccard, and Hamming distances
High Performance: Built with Rust for maximum speed and efficiency
Persistence: RocksDB-backed storage with compression options
Snapshots: Create and manage database snapshots for backup and versioning
Metadata Support: Store and query metadata alongside vectors
Python API: Easy-to-use Python interface with numpy compatibility

Installation

From PyPI (Recommended)

pip install nusterdb

From Source

# Clone the repository
git clone https://github.com/your-org/nusterdb.git
cd nusterdb-python

# Install maturin (build tool for Rust-Python packages)
pip install maturin

# Build and install
maturin develop

Quick Start

import numpy as np
from nusterdb import NusterDB, DatabaseConfig, Vector, IndexType, DistanceMetric

# Create database configuration
config = DatabaseConfig(
    dim=128,
    index_type=IndexType.Hnsw,
    distance_metric=DistanceMetric.Cosine
)

# Initialize database
db = NusterDB("./my_database", config)

# Create and insert vectors
vector1 = Vector([1.0, 2.0, 3.0])  # Direct from list
vector2 = Vector.random(128, -1.0, 1.0)  # Random vector

# Insert vectors with optional metadata
id1 = db.insert(vector1, {"category": "example", "label": "first"})
id2 = db.insert(vector2, {"category": "random", "label": "second"})

# Search for similar vectors
query = Vector([1.1, 2.1, 3.1])
results = db.search(query, k=5)  # Find 5 nearest neighbors

print(f"Found {len(results)} similar vectors:")
for vector_id, distance in results:
    print(f"  ID: {vector_id}, Distance: {distance:.4f}")

# Retrieve vectors and metadata
retrieved_vector = db.get(id1)
metadata = db.get_metadata(id1)
print(f"Vector: {retrieved_vector}")
print(f"Metadata: {metadata}")

Advanced Usage

Batch Operations

# Batch insert multiple vectors
vectors = [Vector.random(128, -1.0, 1.0) for _ in range(1000)]
metadata_list = [{"batch": i} for i in range(1000)]

ids = db.batch_insert(vectors, metadata_list)
print(f"Inserted {len(ids)} vectors")

Index Configuration

# HNSW Configuration for high-recall search
hnsw_config = DatabaseConfig(
    dim=256,
    index_type=IndexType.Hnsw,
    distance_metric=DistanceMetric.Euclidean,
    hnsw_max_connections=32,
    hnsw_ef_construction=400,
    hnsw_max_elements=100000
)

# Flat index for exact search
flat_config = DatabaseConfig(
    dim=256,
    index_type=IndexType.Flat,
    distance_metric=DistanceMetric.Cosine,
    flat_use_simd=True,
    flat_batch_size=2000
)

Database Management

# Create snapshots
db.snapshot("backup_2024", {"version": "1.0", "description": "Initial backup"})

# List snapshots
snapshots = db.list_snapshots()
print("Available snapshots:", snapshots)

# Get database statistics
stats = db.stats()
print(f"Total vectors: {stats['total_vectors']}")
print(f"Database size: {stats['database_size_bytes'] / 1024 / 1024:.2f} MB")
print(f"Cache hit rate: {stats['cache_hit_rate'] * 100:.2f}%")

# Compact database
db.compact()

Vector Operations

# Create vectors
v1 = Vector([1.0, 2.0, 3.0])
v2 = Vector([4.0, 5.0, 6.0])

# Vector arithmetic
v3 = v1 + v2  # Addition
v4 = v1 - v2  # Subtraction
v5 = v1 * 2.0  # Scalar multiplication
v6 = v1 / 2.0  # Scalar division

# Vector properties
print(f"Dimension: {v1.dim()}")
print(f"L2 norm: {v1.norm()}")
print(f"L1 norm: {v1.l1_norm()}")
print(f"Dot product: {v1.dot(v2)}")

# Normalization
v_normalized = v1.normalize()  # Returns new vector
v1.normalize_mut()  # In-place normalization

API Reference

Classes

`NusterDB`

Main database class for vector operations.

Methods:

__init__(path, config): Initialize database
insert(vector, metadata=None): Insert a vector
search(query, k, ef_search=None): Search for nearest neighbors
get(id): Retrieve vector by ID
get_metadata(id): Retrieve metadata by ID
delete(id): Delete vector by ID
update(id, vector): Update vector data
update_metadata(id, metadata): Update metadata
count(): Get total vector count
batch_insert(vectors, metadata_list=None): Insert multiple vectors
range_search(query, radius): Find vectors within distance threshold
snapshot(name=None, metadata=None): Create snapshot
list_snapshots(): List all snapshots
delete_snapshot(name): Delete snapshot
stats(): Get database statistics
compact(): Compact database

`Vector`

Vector class for mathematical operations.

Methods:

__init__(data): Create vector from list
zeros(dim): Create zero vector
ones(dim): Create vector of ones
random(dim, min, max): Create random vector
unit_random(dim): Create random unit vector
dim(): Get dimension
norm(): L2 norm
normalize(): Normalize to unit length
dot(other): Dot product

`DatabaseConfig`

Configuration class for database settings.

Enums

IndexType: Flat, Hnsw, IVF, LSH, Annoy
DistanceMetric: Euclidean, Cosine, Manhattan, Angular, Jaccard, Hamming
Compression: None, Snappy, LZ4, ZSTD

Performance Tips

Choose the right index:
- Use Flat for exact search on small datasets (< 10K vectors)
- Use HNSW for approximate search on large datasets
Optimize HNSW parameters:
- Increase ef_construction for better quality (slower build)
- Increase max_connections for better recall (more memory)
Use appropriate distance metric:
- Cosine for normalized vectors
- Euclidean for general purpose
- Manhattan for sparse vectors
Enable SIMD for flat index when possible
Adjust cache size based on available memory

Requirements

Python >= 3.8
numpy >= 1.19.0

License

MIT License. See LICENSE file for details.

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

Changelog

0.1.0

Initial release
Support for Flat and HNSW indices
Python bindings with PyO3
Basic vector operations and database management

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

2.1.3

Aug 4, 2025

2.1.2

Aug 3, 2025

2.1.1

Aug 3, 2025

2.1.0

Aug 3, 2025

0.1.6

Jul 3, 2025

0.1.5

Jul 3, 2025

0.1.2

Jun 20, 2025

0.1.1

Jun 19, 2025

This version

0.1.0

Jun 19, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

nusterdb-0.1.0-cp312-cp312-macosx_11_0_arm64.whl (269.1 kB view details)

Uploaded Jun 19, 2025 CPython 3.12macOS 11.0+ ARM64

File details

Details for the file nusterdb-0.1.0-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

Download URL: nusterdb-0.1.0-cp312-cp312-macosx_11_0_arm64.whl
Upload date: Jun 19, 2025
Size: 269.1 kB
Tags: CPython 3.12, macOS 11.0+ ARM64
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.8.7

File hashes

Hashes for nusterdb-0.1.0-cp312-cp312-macosx_11_0_arm64.whl
Algorithm	Hash digest
SHA256	`2414ec652071cd7dd4c3d263fc699040289f5917c67cc35f4b5aa515d1aec27c`
MD5	`01777d673c1243908492e95be3009180`
BLAKE2b-256	`58650602328bdb1735c1737f7c31ae77103ec848e706cacdbff957d4e5b548bd`

See more details on using hashes here.

nusterdb 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

NusterDB - High-Performance Vector Database

Features

Installation

From PyPI (Recommended)

From Source

Quick Start

Advanced Usage

Batch Operations

Index Configuration

Database Management

Vector Operations

API Reference

Classes

NusterDB

Vector

DatabaseConfig

Enums

Performance Tips

Requirements

License

Contributing

Changelog

0.1.0

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes

`NusterDB`

`Vector`

`DatabaseConfig`