High-performance exact vector similarity search with Rust backend
Project description
NSeekFS
High-Performance Vector Similarity Search with Rust Backend
Fast and exact cosine similarity search for Python. Built with Rust for performance, designed for production use.
pip install nseekfs
Quick Start
import nseekfs
import numpy as np
# Create test vectors
embeddings = np.random.randn(10000, 384).astype(np.float32)
query = np.random.randn(384).astype(np.float32)
# Build index and search
index = nseekfs.from_embeddings(embeddings, normalized=True)
results = index.query(query, top_k=10)
# Results is a list of dictionaries
print(f"Found {len(results)} results")
print(f"Best match: vector {results[0]['idx']} (similarity: {results[0]['score']:.3f})")
Core Features
Exact Vector Search
# Basic search
index = nseekfs.from_embeddings(embeddings, normalized=True)
results = index.query(query, top_k=10)
# Access results
for item in results:
print(f"Vector {item['idx']}: {item['score']:.6f}")
# Index properties
print(f"Index contains {index.rows} vectors x {index.dims} dimensions")
Batch Processing
# Process multiple queries efficiently
queries = np.random.randn(50, 384).astype(np.float32)
batch_results = index.query_batch(queries, top_k=5)
print(f"Processed {len(queries)} queries")
for i, query_results in enumerate(batch_results):
print(f"Query {i}: {len(query_results)} results")
# Each query_results is a list of {'idx': int, 'score': float}
Advanced Query Options
# Simple format (default) - returns list of dicts
results = index.query(query, top_k=10, format='simple')
# Detailed format - returns QueryResult object with timing
result_obj = index.query(query, top_k=10, format='detailed')
print(f"Query took {result_obj.query_time_ms:.2f}ms")
# With timing tuple
results, timing = index.query(query, top_k=10, return_timing=True)
Index Persistence
# Load existing index
index = nseekfs.from_bin("my_vectors.nseek")
print(f"Loaded index with {index.rows} vectors x {index.dims} dimensions")
Performance Monitoring
# Get detailed performance metrics
metrics = index.get_performance_metrics()
print(f"Total queries: {metrics['total_queries']}")
print(f"Average time: {metrics['avg_query_time_ms']:.2f}ms")
print(f"SIMD queries: {metrics['simd_queries']}")
print(f"Queries per second: {metrics['queries_per_second']:.0f}")
Built-in Benchmark
# Run performance benchmark
nseekfs.benchmark(vectors=1000, dims=384, queries=100, verbose=True)
API Reference
Index Creation
# Basic usage
index = nseekfs.from_embeddings(
embeddings, # numpy array of float32 vectors
normalized=True, # normalize vectors (default: False)
verbose=False # show progress (default: False)
)
# Load existing index
index = nseekfs.from_bin("path/to/index.nseek")
Query Methods
# Simple query (returns list of dicts)
results = index.query(query_vector, top_k=10)
# Returns: [{'idx': int, 'score': float}, ...]
# Detailed query (returns QueryResult object)
result = index.query_detailed(query_vector, top_k=10)
# Simple query explicitly
results = index.query_simple(query_vector, top_k=10)
# Batch queries
batch_results = index.query_batch(queries_array, top_k=10)
# Returns: List of lists of dicts
Index Properties
print(f"Vectors: {index.rows}")
print(f"Dimensions: {index.dims}")
print(f"Index path: {index.index_path}")
print(f"Config: {index.config}")
Architecture Highlights
SIMD Optimizations
- AVX2 support for 8x parallelism on compatible CPUs
- Automatic fallback to scalar operations on older hardware
- Runtime detection of CPU capabilities
Memory Management
- Memory mapping for efficient data access
- Thread-local buffers for zero-allocation queries
- Cache-aligned data structures for optimal performance
Batch Processing
- Intelligent batching strategies based on query size
- SIMD vectorization across multiple queries
- Optimized memory access patterns
Installation
# From PyPI
pip install nseekfs
# Verify installation
python -c "import nseekfs; print('NSeekFS installed successfully')"
Technical Details
- Precision: Float32 optimized for standard ML embeddings
- Memory: Efficient memory usage with optimized data structures
- Performance: Rust backend with SIMD optimizations where available
- Compatibility: Python 3.8+ on Windows, macOS, and Linux
- Thread Safety: Safe concurrent access from multiple threads
Performance Tips
# Pre-normalize vectors if using cosine similarity
embeddings = embeddings / np.linalg.norm(embeddings, axis=1, keepdims=True)
index = nseekfs.from_embeddings(embeddings, normalized=False)
# Use appropriate data types
embeddings = embeddings.astype(np.float32)
# Choose optimal top_k values
results = index.query(query, top_k=10) # vs top_k=1000
# Use batch processing for multiple queries
batch_results = index.query_batch(queries, top_k=10)
License
MIT License - see LICENSE file for details.
Fast, exact cosine similarity search for Python.
Built with Rust for performance, designed for Python developers.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nseekfs-1.0.0.tar.gz.
File metadata
- Download URL: nseekfs-1.0.0.tar.gz
- Upload date:
- Size: 81.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1dab000309adb212fc1317b714a6413e7668385ae602d54f020f3f0b2c117423
|
|
| MD5 |
4c03ee9271e8070dab82d58fd9607313
|
|
| BLAKE2b-256 |
a7e0e85dc02236aba46fefd7288c2b2030e051d6b14ec4fd0c3c75f20103257a
|
File details
Details for the file nseekfs-1.0.0-cp38-abi3-win_amd64.whl.
File metadata
- Download URL: nseekfs-1.0.0-cp38-abi3-win_amd64.whl
- Upload date:
- Size: 237.1 kB
- Tags: CPython 3.8+, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
175aeaac6cb48341289e0a4d3639e2ab75f6627ce898a30bc686b69e31d6e8f6
|
|
| MD5 |
08fb1beddc76fa8af480e9103a79f962
|
|
| BLAKE2b-256 |
e030ed4a2ff50c30a4d91a5a52f7aac44d6eb8f575c7cc934be4055489b4d6bc
|
File details
Details for the file nseekfs-1.0.0-cp38-abi3-manylinux_2_34_x86_64.whl.
File metadata
- Download URL: nseekfs-1.0.0-cp38-abi3-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 357.0 kB
- Tags: CPython 3.8+, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d38d94471ba1943604ea6a5b601f2ffba130f827eef9495def37cbd3d5c53bb2
|
|
| MD5 |
daeb64fddf6ac7d4940d65c25a81ba3e
|
|
| BLAKE2b-256 |
c805575f55d9c07a91a180900c04eaa13730acac6050fe3cf865612f29ba4448
|
File details
Details for the file nseekfs-1.0.0-cp38-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: nseekfs-1.0.0-cp38-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 281.4 kB
- Tags: CPython 3.8+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
74658a13df4e7cf5d110b1697917c33b7030dde200c064a737225051cb442c38
|
|
| MD5 |
4c2b6d658dc429000fa3b75f9f1812be
|
|
| BLAKE2b-256 |
b29cde4ff51d98075ad61489f3861b2f8e4575750cd7f21755dcc37a6aead17f
|
File details
Details for the file nseekfs-1.0.0-cp38-abi3-macosx_10_12_x86_64.whl.
File metadata
- Download URL: nseekfs-1.0.0-cp38-abi3-macosx_10_12_x86_64.whl
- Upload date:
- Size: 307.2 kB
- Tags: CPython 3.8+, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
65d6cbad984f478dfcc6be14eb524cfc546d7221d49b69b05c112b8e4b30901e
|
|
| MD5 |
813e802c40c429f0c13e31d802e7126b
|
|
| BLAKE2b-256 |
1f40273be01da99918d88c7ab8c1192d3ef4584fdf31e3081e2a8ed345b2bc48
|