High-performance vector database with support for various indexing algorithms
Project description
NusterDB - High-Performance Vector Database
NusterDB is a high-performance vector database built with Rust, designed for similarity search and nearest neighbor queries. It supports multiple indexing algorithms and distance metrics, making it suitable for machine learning applications, recommendation systems, and any use case requiring efficient vector operations.
Features
- Multiple Index Types: Support for Flat, HNSW, IVF, LSH, and Annoy indices
- Distance Metrics: Euclidean, Cosine, Manhattan, Angular, Jaccard, and Hamming distances
- High Performance: Built with Rust for maximum speed and efficiency
- Persistence: RocksDB-backed storage with compression options
- Snapshots: Create and manage database snapshots for backup and versioning
- Metadata Support: Store and query metadata alongside vectors
- Python API: Easy-to-use Python interface with numpy compatibility
Installation
From PyPI (Recommended)
pip install nusterdb
From Source
# Clone the repository
git clone https://github.com/your-org/nusterdb.git
cd nusterdb-python
# Install maturin (build tool for Rust-Python packages)
pip install maturin
# Build and install
maturin develop
Quick Start
import numpy as np
from nusterdb import NusterDB, DatabaseConfig, Vector, IndexType, DistanceMetric
# Create database configuration
config = DatabaseConfig(
dim=128,
index_type=IndexType.Hnsw,
distance_metric=DistanceMetric.Cosine
)
# Initialize database
db = NusterDB("./my_database", config)
# Create and insert vectors
vector1 = Vector([1.0, 2.0, 3.0]) # Direct from list
vector2 = Vector.random(128, -1.0, 1.0) # Random vector
# Insert vectors with optional metadata
id1 = db.insert(vector1, {"category": "example", "label": "first"})
id2 = db.insert(vector2, {"category": "random", "label": "second"})
# Search for similar vectors
query = Vector([1.1, 2.1, 3.1])
results = db.search(query, k=5) # Find 5 nearest neighbors
print(f"Found {len(results)} similar vectors:")
for vector_id, distance in results:
print(f" ID: {vector_id}, Distance: {distance:.4f}")
# Retrieve vectors and metadata
retrieved_vector = db.get(id1)
metadata = db.get_metadata(id1)
print(f"Vector: {retrieved_vector}")
print(f"Metadata: {metadata}")
Advanced Usage
Batch Operations
# Batch insert multiple vectors
vectors = [Vector.random(128, -1.0, 1.0) for _ in range(1000)]
metadata_list = [{"batch": i} for i in range(1000)]
ids = db.batch_insert(vectors, metadata_list)
print(f"Inserted {len(ids)} vectors")
Index Configuration
# HNSW Configuration for high-recall search
hnsw_config = DatabaseConfig(
dim=256,
index_type=IndexType.Hnsw,
distance_metric=DistanceMetric.Euclidean,
hnsw_max_connections=32,
hnsw_ef_construction=400,
hnsw_max_elements=100000
)
# Flat index for exact search
flat_config = DatabaseConfig(
dim=256,
index_type=IndexType.Flat,
distance_metric=DistanceMetric.Cosine,
flat_use_simd=True,
flat_batch_size=2000
)
Database Management
# Create snapshots
db.snapshot("backup_2024", {"version": "1.0", "description": "Initial backup"})
# List snapshots
snapshots = db.list_snapshots()
print("Available snapshots:", snapshots)
# Get database statistics
stats = db.stats()
print(f"Total vectors: {stats['total_vectors']}")
print(f"Database size: {stats['database_size_bytes'] / 1024 / 1024:.2f} MB")
print(f"Cache hit rate: {stats['cache_hit_rate'] * 100:.2f}%")
# Compact database
db.compact()
Vector Operations
# Create vectors
v1 = Vector([1.0, 2.0, 3.0])
v2 = Vector([4.0, 5.0, 6.0])
# Vector arithmetic
v3 = v1 + v2 # Addition
v4 = v1 - v2 # Subtraction
v5 = v1 * 2.0 # Scalar multiplication
v6 = v1 / 2.0 # Scalar division
# Vector properties
print(f"Dimension: {v1.dim()}")
print(f"L2 norm: {v1.norm()}")
print(f"L1 norm: {v1.l1_norm()}")
print(f"Dot product: {v1.dot(v2)}")
# Normalization
v_normalized = v1.normalize() # Returns new vector
v1.normalize_mut() # In-place normalization
API Reference
Classes
NusterDB
Main database class for vector operations.
Methods:
__init__(path, config): Initialize databaseinsert(vector, metadata=None): Insert a vectorsearch(query, k, ef_search=None): Search for nearest neighborsget(id): Retrieve vector by IDget_metadata(id): Retrieve metadata by IDdelete(id): Delete vector by IDupdate(id, vector): Update vector dataupdate_metadata(id, metadata): Update metadatacount(): Get total vector countbatch_insert(vectors, metadata_list=None): Insert multiple vectorsrange_search(query, radius): Find vectors within distance thresholdsnapshot(name=None, metadata=None): Create snapshotlist_snapshots(): List all snapshotsdelete_snapshot(name): Delete snapshotstats(): Get database statisticscompact(): Compact database
Vector
Vector class for mathematical operations.
Methods:
__init__(data): Create vector from listzeros(dim): Create zero vectorones(dim): Create vector of onesrandom(dim, min, max): Create random vectorunit_random(dim): Create random unit vectordim(): Get dimensionnorm(): L2 normnormalize(): Normalize to unit lengthdot(other): Dot product
DatabaseConfig
Configuration class for database settings.
Enums
IndexType: Flat, Hnsw, IVF, LSH, AnnoyDistanceMetric: Euclidean, Cosine, Manhattan, Angular, Jaccard, HammingCompression: None, Snappy, LZ4, ZSTD
Performance Tips
-
Choose the right index:
- Use
Flatfor exact search on small datasets (< 10K vectors) - Use
HNSWfor approximate search on large datasets
- Use
-
Optimize HNSW parameters:
- Increase
ef_constructionfor better quality (slower build) - Increase
max_connectionsfor better recall (more memory)
- Increase
-
Use appropriate distance metric:
Cosinefor normalized vectorsEuclideanfor general purposeManhattanfor sparse vectors
-
Enable SIMD for flat index when possible
-
Adjust cache size based on available memory
Requirements
- Python >= 3.8
- numpy >= 1.19.0
License
MIT License. See LICENSE file for details.
Contributing
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
Changelog
0.1.0
- Initial release
- Support for Flat and HNSW indices
- Python bindings with PyO3
- Basic vector operations and database management
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nusterdb-0.1.0-cp312-cp312-macosx_11_0_arm64.whl.
File metadata
- Download URL: nusterdb-0.1.0-cp312-cp312-macosx_11_0_arm64.whl
- Upload date:
- Size: 269.1 kB
- Tags: CPython 3.12, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.8.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2414ec652071cd7dd4c3d263fc699040289f5917c67cc35f4b5aa515d1aec27c
|
|
| MD5 |
01777d673c1243908492e95be3009180
|
|
| BLAKE2b-256 |
58650602328bdb1735c1737f7c31ae77103ec848e706cacdbff957d4e5b548bd
|