High-performance exact vector similarity search with Rust backend
Project description
NSeekFS
High-Performance Exact Vector Search with Rust Backend
Fast and exact vector similarity search for Python. Built with Rust for performance, designed for production use.
NSeekFS combines the safety and performance of Rust with a clean Python API.
This release supports exact vector search with multiple similarity metrics:
cosine(requires normalized vectors)doteuclidean
Upcoming releases will expand support to:
- Approximate Nearest Neighbor (ANN) search
- Additional precision levels and memory optimizations
Our goal: deliver a fast, reliable, and production-ready search engine that evolves with your needs.
pip install nseekfs
Quick Start
import nseekfs
import numpy as np
# Create some test vectors
embeddings = np.random.randn(10000, 384).astype(np.float32)
query = np.random.randn(384).astype(np.float32)
# Choose metric: "cosine", "dot", or "euclidean"
metric = "cosine"
# Normalize only if using cosine
if metric == "cosine":
embeddings = embeddings / np.linalg.norm(embeddings, axis=1, keepdims=True)
query = query / np.linalg.norm(query)
# Build index and run a search
index = nseekfs.from_embeddings(embeddings, metric=metric, normalized=True)
results = index.query(query, top_k=10)
print(f"Found {len(results)} results")
print(f"Best match: idx={results[0]['idx']} score={results[0]['score']:.3f}")
Core Features
Exact Search
# Simple query
results = index.query(query, top_k=10)
# Access results
for item in results:
print(f"Vector {item['idx']}: {item['score']:.6f}")
Batch Queries
queries = np.random.randn(50, 384).astype(np.float32)
if metric == "cosine":
queries = queries / np.linalg.norm(queries, axis=1, keepdims=True)
batch_results = index.query_batch(queries, top_k=5)
print(f"Processed {len(batch_results)} queries")
Query Options
# Simple query
results = index.query_simple(query, top_k=10)
# Detailed query
result = index.query_detailed(query, top_k=10)
print(f"Query took {result.query_time_ms:.2f} ms, top1 idx={result.results[0]['idx']}")
Index Persistence
# Build and save index
index = nseekfs.from_embeddings(embeddings, metric=metric, normalized=True)
print("Index saved at:", index.index_path)
# Later, reload from file
index2 = nseekfs.from_bin(index.index_path)
print(f"Reloaded index: {index2.rows} vectors x {index2.dims} dims")
API Reference
Index
from_embeddings(embeddings, metric="cosine", normalized=True, verbose=False)from_bin(path)
Queries
query(query_vector, top_k=10)query_simple(query_vector, top_k=10)query_detailed(query_vector, top_k=10)query_batch(queries, top_k=10)
Properties
index.rowsindex.dimsindex.config
Metric Guide
| Metric | Normalization required | Typical use case |
|---|---|---|
| cosine | Yes | Semantic embeddings (e.g. sentence-transformers, OpenAI) |
| dot | No | Raw model outputs where scale carries meaning |
| euclidean | No | Geometric distance or clustering tasks |
Architecture Highlights
- Similarity Metrics: cosine (with enforced normalization), dot product, and Euclidean (−L2²).
- Batch Query Engine: adaptive selection between full matrix GEMM, chunked streaming, and parallel Rayon fallback.
- SIMD Acceleration: custom
wide::f32x8kernels for dot/L2, withmatrixmultiply::sgemmfor block GEMM. - Memory Mapping: compact binary format (header + float data) using
memmap2for zero-copy loading. - Streaming Index Build: chunked writes, runtime memory estimation, safe normalization, and atomic file finalization.
- Cross-Platform Optimizations: runtime SIMD detection (AVX2/AVX/SSE4.2/NEON) and environment-controlled tuning (
NSEEK_THREADS,NSEEK_QBLOCK,NSEEK_DBLOCK).
Installation
# From PyPI
pip install nseekfs
# Verify installation
python -c "import nseekfs; print('NSeekFS installed successfully')"
Technical Details
- Similarity Metrics: Cosine (with enforced normalization), Dot Product, Euclidean (−L2²).
- Precision: Float32 core; future-ready hooks for f16, f8, f64 levels.
- Index Format: Compact binary (12-byte header + contiguous float data), memory-mapped via
memmap2. - Batch Queries: Adaptive engine with GEMM (matrixmultiply), SIMD kernels (
wide::f32x8), or parallel Rayon fallback. - Memory: Streaming index build with chunking, safe normalization, and runtime memory estimation.
- Performance: AVX2/AVX/SSE4.2/NEON runtime detection; env vars (
NSEEK_THREADS,NSEEK_QBLOCK,NSEEK_DBLOCK) for tuning. - Thread Safety: Parallel query execution via Rayon; thread-safe index loading.
- Compatibility: Python 3.8+ on Windows, macOS, Linux.
Performance Tips
# Cosine similarity: normalize embeddings and queries
embeddings = embeddings / np.linalg.norm(embeddings, axis=1, keepdims=True)
index = nseekfs.from_embeddings(embeddings, metric="cosine", normalized=True)
# Dot or Euclidean: use raw embeddings, no normalization
index = nseekfs.from_embeddings(embeddings, metric="dot", normalized=False)
License
MIT License - see LICENSE file for details.
Fast, exact vector similarity search for Python.
Built with Rust for performance, designed for Python developers.
Source: github.com/NSeek-AI/nseekfs
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nseekfs-1.0.4.tar.gz.
File metadata
- Download URL: nseekfs-1.0.4.tar.gz
- Upload date:
- Size: 79.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
33e819edf64bf6a78e09a765eed9913edebc48c4b27751d83b640c19f88f88ea
|
|
| MD5 |
a1ea6559669f250cbe780bfb974f4734
|
|
| BLAKE2b-256 |
8110d70cf362854ca4283467f0df508d23d2567cda3fcf2b916791c63fd7307f
|
File details
Details for the file nseekfs-1.0.4-cp38-abi3-win_amd64.whl.
File metadata
- Download URL: nseekfs-1.0.4-cp38-abi3-win_amd64.whl
- Upload date:
- Size: 227.2 kB
- Tags: CPython 3.8+, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e5848dd1356ed90ff70fb6582f2ef91bca37fbe88be195b93291dd7fe3ca7c3e
|
|
| MD5 |
53195bd9af0549ce720162e96478318c
|
|
| BLAKE2b-256 |
44c6122a8e1e4fd6cfbb11625c7b706df7d80345013721cca1fd277e89eda7b4
|
File details
Details for the file nseekfs-1.0.4-cp38-abi3-manylinux_2_34_x86_64.whl.
File metadata
- Download URL: nseekfs-1.0.4-cp38-abi3-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 347.9 kB
- Tags: CPython 3.8+, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f347eba0786fd3cc66a5a2a207492d11453de68a24bca6b5257a9457ec193d6d
|
|
| MD5 |
b7e247aabbaf251eb353bd8bd9ab28ae
|
|
| BLAKE2b-256 |
20e5da159cc6c26b5a42585f5450a641619df862592ffd49ae2d9ce9d8379bb7
|
File details
Details for the file nseekfs-1.0.4-cp38-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: nseekfs-1.0.4-cp38-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 279.5 kB
- Tags: CPython 3.8+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4bac2596f1d94057410c7d5b0ccb6661d5ffa247b265b2558441fab1dc41e252
|
|
| MD5 |
c6505e9ca0c1ea222461563f223870f7
|
|
| BLAKE2b-256 |
61a8866ab9e0fe6825c77ead927a6a7e7c7d57eb3b9eb424c0d1b45b84c213bd
|
File details
Details for the file nseekfs-1.0.4-cp38-abi3-macosx_10_12_x86_64.whl.
File metadata
- Download URL: nseekfs-1.0.4-cp38-abi3-macosx_10_12_x86_64.whl
- Upload date:
- Size: 299.6 kB
- Tags: CPython 3.8+, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
db055eeb420ca15e29afe69e8fd534926e5e626433ce50706b66a827d96fbc19
|
|
| MD5 |
d6197a636fd91933648549264cf8dce6
|
|
| BLAKE2b-256 |
fb84881226666d4374da217226d245e919da8c8f5d61e1f852d1f43a50cea406
|