High-performance exact vector similarity search with Rust backend

These details have not been verified by PyPI

Project links

Project description

NSeekFS

High-Performance Exact Vector Search with Rust Backend

Fast and exact vector similarity search for Python. Built with Rust for performance, designed for production use, with a split between a low-overhead search path and a richer audit/detailed path.

NSeekFS combines the safety and performance of Rust with a clean Python API.
This release supports exact vector search with multiple similarity metrics:

cosine (requires normalized vectors)
dot
euclidean

Upcoming releases will expand support to:

Approximate Nearest Neighbor (ANN) search
Additional precision levels and memory optimizations

Our goal: deliver a fast, reliable, and production-ready search engine that evolves with your needs.

pip install nseekfs

Quick Start

import nseekfs
import numpy as np

# Create some test vectors
embeddings = np.random.randn(10000, 384).astype(np.float32)
query = np.random.randn(384).astype(np.float32)

# Choose metric: "cosine", "dot", or "euclidean"
metric = "cosine"

# Normalize only if using cosine
if metric == "cosine":
    embeddings = embeddings / np.linalg.norm(embeddings, axis=1, keepdims=True)
    query = query / np.linalg.norm(query)

# Build index and run a search
index = nseekfs.from_embeddings(embeddings, metric=metric, normalized=True)
results = index.query(query, top_k=10)

print(f"Found {len(results)} results")
print(f"Best match: idx={results[0]['idx']} score={results[0]['score']:.3f}")

Core Features

Exact Search

# Simple query
results = index.query(query, top_k=10)

# Access results
for item in results:
    print(f"Vector {item['idx']}: {item['score']:.6f}")

Batch Queries

queries = np.random.randn(50, 384).astype(np.float32)
if metric == "cosine":
    queries = queries / np.linalg.norm(queries, axis=1, keepdims=True)

batch_results = index.query_batch(queries, top_k=5)
print(f"Processed {len(batch_results)} queries")

Result Layers

NSeekFS now exposes two result layers on top of the exact engine:

simple for the lowest-overhead ranked results
detailed for engineering, audit, margins, replay, and debugging

Use simple when you care about latency. Use detailed when you need traceability.

# Fast path: minimal result shaping
results = index.query_simple(query, top_k=10)
raw = index.query_simple_arrays(query, top_k=10)

# Detailed path: ranking + audit metadata + margins
detail = index.query_detailed(query, top_k=10)
print(detail.results[0]["rank"], detail.results[0]["idx"], detail.results[0]["score"])
print(detail.margins)
print(detail.audit["index_hash"])

# Batch detailed path
batch_detail = index.query_batch(queries, top_k=10, format="detailed")

What the detailed layer adds:

deterministic ranking with rank
score on every result
margins such as margin_to_next and margin_top1_to_last
audit metadata: engine_version, metric, dims, rows, index_hash, query_hash
optional replay support through exported audit JSON

Query Options

# Simple query
results = index.query_simple(query, top_k=10)

# Fast array query for engineers
raw = index.query_simple_arrays(query, top_k=10)
print(raw["indices"], raw["scores"])

# Detailed query
result = index.query_detailed(query, top_k=10)
print(f"Query took {result.query_time_ms:.2f} ms, top1 rank={result.results[0]['rank']}, idx={result.results[0]['idx']}")
print(result.margins["margin_top1_to_last"])
print(result.audit["query_hash"])

Provably Optimal Exact Search (Certified)

# Exact search with deterministic pruning and optimality certificate
certified = index.query_exact_certified(
    query,
    top_k=10,
    block_size=64,
    enable_pruning=True,
    return_certificate=True,
)

print(certified.results[0])
print(certified.certificate["safe"])  # must be True
print(certified.certificate["pruned_candidates"])

This mode keeps exact semantics while pruning candidates using safe bounds. Every certified query returns metadata proving pruning safety for the final top-k.

Certified Guarantee

Same ranking as brute force exact search for the same metric and inputs.
Deterministic tie-break rule: score order + idx ascending as final tie-break.
Certificate includes pruning counts, final kth_score, bound type, and pruned bound values.

Assumptions and Contracts

Data type: float32.
cosine:
If normalized=True, embeddings are expected normalized by caller.
If normalized=False, index build normalizes embeddings internally.
Query vector can be any norm (ranking is invariant to positive scaling).
Optional strict mode: strict_query_normalized=True in query_exact_certified.

When It Speeds Up

More acceleration when early blocks are discriminative and top_k is small.
Less acceleration on highly tied datasets or when all vectors are similarly close.
Correctness does not depend on speedup; pruning may be low and still certified.

Audit and Replay

export_audit stores certificate and replay metadata (block_order, block_size, simd_path, compile_flags, bounds).
replay validates ranking and certificate invariants (counts and bound condition).

Index Persistence

# Build and save index
index = nseekfs.from_embeddings(embeddings, metric=metric, normalized=True)
print("Index saved at:", index.index_path)

# Later, reload from file
index2 = nseekfs.from_bin(index.index_path)
print(f"Reloaded index: {index2.rows} vectors x {index2.dims} dims")

API Reference

Index

from_embeddings(embeddings, metric="cosine", normalized=True, verbose=False)
from_bin(path)

Queries

query(query_vector, top_k=10)
query_simple(query_vector, top_k=10)
query_simple_arrays(query_vector, top_k=10)
query_detailed(query_vector, top_k=10)
query_batch(queries, top_k=10)
query_batch_detailed(queries, top_k=10)
query_batch_arrays(queries, top_k=10)
query_exact_audit(query_vector, top_k=10)
query_exact_certified(query_vector, top_k=10, block_size=64, enable_pruning=True, return_certificate=True)

Audit and Replay

# Export an audit JSON from a detailed query
result = index.query_detailed(query, top_k=10)
audit = nseekfs.export_audit(result, "audit.json")

# Replay later using the saved audit
replayed = nseekfs.replay("audit.json")
print(replayed["ok"])

This layer is intentionally separate from query_simple. It is for:

deterministic reruns
offline verification
debugging ranking differences
chatbot confidence logic based on margins and ties

Properties

index.rows
index.dims
index.config

Metric Guide

Metric	Normalization required	Typical use case
cosine	Yes	Semantic embeddings (e.g. sentence-transformers, OpenAI)
dot	No	Raw model outputs where scale carries meaning
euclidean	No	Geometric distance or clustering tasks

The primary optimized path in this release is cosine with already-normalized embeddings. That is the path to use when you want the lowest overhead and the simplest engineering contract. dot and euclidean remain available for compatibility and debugging. Ordering is deterministic: equal scores are broken by smaller idx.

Architecture Highlights

Similarity Metrics: cosine (with enforced normalization), dot product, and Euclidean (−L2²).
Result Layers: low-overhead simple, engineering-oriented detailed, and audit/replay metadata for deterministic validation.
Batch Query Engine: adaptive selection between full matrix GEMM, chunked streaming, and parallel Rayon fallback.
SIMD Acceleration: custom wide::f32x8 kernels for dot/L2, with matrixmultiply::sgemm for block GEMM.
Memory Mapping: compact binary format (header + float data) using memmap2 for zero-copy loading.
Streaming Index Build: chunked writes, runtime memory estimation, safe normalization, and atomic file finalization.
Cross-Platform Optimizations: runtime SIMD detection (AVX2/AVX/SSE4.2/NEON) and environment-controlled mode overrides (NSEEK_THREADS, NSEEK_SINGLE_MODE, NSEEK_BATCH_MODE).

Installation

# From PyPI
pip install nseekfs

# Verify installation
python -c "import nseekfs; print('NSeekFS installed successfully')"

Technical Details

Similarity Metrics: Cosine (with enforced normalization), Dot Product, Euclidean (−L2²).
Result Layers: simple for low overhead, detailed for margins/audit, plus replay support for deterministic validation.
Precision: Float32 core; future-ready hooks for f16, f8, f64 levels.
Index Format: Compact binary (12-byte header + contiguous float data), memory-mapped via memmap2.
Batch Queries: Adaptive engine with GEMM (matrixmultiply), SIMD kernels (wide::f32x8), or parallel Rayon fallback.
Memory: Streaming index build with chunking, safe normalization, and runtime memory estimation.
Performance: AVX2/AVX/SSE4.2/NEON runtime detection; env vars (NSEEK_THREADS, NSEEK_QBLOCK, NSEEK_DBLOCK) for tuning.
Thread Safety: Parallel query execution via Rayon; thread-safe index loading.
Compatibility: Python 3.8+ on Windows, macOS, Linux.

Performance Tips

# Cosine similarity: normalize embeddings and queries
embeddings = embeddings / np.linalg.norm(embeddings, axis=1, keepdims=True)
index = nseekfs.from_embeddings(embeddings, metric="cosine", normalized=True)

# Dot or Euclidean: use raw embeddings, no normalization
index = nseekfs.from_embeddings(embeddings, metric="dot", normalized=False)

Competitive Benchmarking

Use the competitive benchmark to compare NSeekFS with similar tools on the same data:

python bench/benchmark_competitors.py --rows 100000 --dims 384 --queries 200 --top-k 10 --metric cosine

Notes:

Ground truth is exact brute force from NSeekFS.
FAISS/HNSWlib are optional; script reports unavailable libraries gracefully.
Report includes latency and recall@k vs exact ground truth.

Run the standardized scenario suite:

python bench/run_competitive_suite.py --scenarios bench/competitive_scenarios.json --out-dir benchmark_artifacts

Publication template and rules: docs/COMPETITIVE_BENCHMARKS.md

Product Positioning

NSeekFS is optimized for exact, deterministic, auditable retrieval workflows.
Positioning details: docs/POSITIONING.md

License

MIT License - see LICENSE file for details.

Fast, exact vector similarity search for Python.

Built with Rust for performance, designed for Python developers.
Source: github.com/NSeek-AI/nseekfs

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.1.0

Apr 29, 2026

1.0.4

Sep 24, 2025

1.0.3

Sep 9, 2025

1.0.2

Sep 7, 2025

1.0.1

Sep 5, 2025

1.0.0

Sep 4, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nseekfs-1.1.0.tar.gz (126.0 kB view details)

Uploaded Apr 29, 2026 Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

nseekfs-1.1.0-cp38-abi3-win_amd64.whl (359.2 kB view details)

Uploaded Apr 29, 2026 CPython 3.8+Windows x86-64

nseekfs-1.1.0-cp38-abi3-manylinux_2_34_x86_64.whl (483.8 kB view details)

Uploaded Apr 29, 2026 CPython 3.8+manylinux: glibc 2.34+ x86-64

nseekfs-1.1.0-cp38-abi3-macosx_11_0_arm64.whl (389.9 kB view details)

Uploaded Apr 29, 2026 CPython 3.8+macOS 11.0+ ARM64

nseekfs-1.1.0-cp38-abi3-macosx_10_12_x86_64.whl (434.4 kB view details)

Uploaded Apr 29, 2026 CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file nseekfs-1.1.0.tar.gz.

File metadata

Download URL: nseekfs-1.1.0.tar.gz
Upload date: Apr 29, 2026
Size: 126.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for nseekfs-1.1.0.tar.gz
Algorithm	Hash digest
SHA256	`0c9b04779038bc41c558d8fafbf224a7d7ee249c6e4dde8ad24fc33e5a512ffe`
MD5	`9d756504f7b072952365a9aa1c745be2`
BLAKE2b-256	`0229ffb200234aee0d2fd986bff1ec3d32a0c5c9e129905bf414a3d58a129ce6`

See more details on using hashes here.

File details

Details for the file nseekfs-1.1.0-cp38-abi3-win_amd64.whl.

File metadata

Download URL: nseekfs-1.1.0-cp38-abi3-win_amd64.whl
Upload date: Apr 29, 2026
Size: 359.2 kB
Tags: CPython 3.8+, Windows x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for nseekfs-1.1.0-cp38-abi3-win_amd64.whl
Algorithm	Hash digest
SHA256	`43e90b272618537078fabbcc67baf8e087acb8e46b4c7dd1d5de0ee9a8ace9de`
MD5	`1bed7e598fdfedf4a9a0b69ac8e83af6`
BLAKE2b-256	`aa6796b6566daa9c8c01a9c96f539b62405b8236bab5e60aa5d02b461f034cac`

See more details on using hashes here.

File details

Details for the file nseekfs-1.1.0-cp38-abi3-manylinux_2_34_x86_64.whl.

File metadata

Download URL: nseekfs-1.1.0-cp38-abi3-manylinux_2_34_x86_64.whl
Upload date: Apr 29, 2026
Size: 483.8 kB
Tags: CPython 3.8+, manylinux: glibc 2.34+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for nseekfs-1.1.0-cp38-abi3-manylinux_2_34_x86_64.whl
Algorithm	Hash digest
SHA256	`cd8605105b226407d926bd1499886e5c74fa4c810d165fd4bb3e9128ced9735b`
MD5	`0dff4c6cb0a9cee96ebb9444a2f5bf2b`
BLAKE2b-256	`293d550e0219ff76e9c84720852d164598397d52057a270a8643fc786f39a491`

See more details on using hashes here.

File details

Details for the file nseekfs-1.1.0-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

Download URL: nseekfs-1.1.0-cp38-abi3-macosx_11_0_arm64.whl
Upload date: Apr 29, 2026
Size: 389.9 kB
Tags: CPython 3.8+, macOS 11.0+ ARM64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for nseekfs-1.1.0-cp38-abi3-macosx_11_0_arm64.whl
Algorithm	Hash digest
SHA256	`bd8c57c9db857b8289196019c7f979306c0004822b58aabc8caf83742e5ca980`
MD5	`bcd067637987a91254edc1ccdb4aec1c`
BLAKE2b-256	`b93ba4367cda75983af89c0f4c3d2b089b528189e6ee2628fd55171a4a7b61e0`

See more details on using hashes here.

File details

Details for the file nseekfs-1.1.0-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

Download URL: nseekfs-1.1.0-cp38-abi3-macosx_10_12_x86_64.whl
Upload date: Apr 29, 2026
Size: 434.4 kB
Tags: CPython 3.8+, macOS 10.12+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for nseekfs-1.1.0-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm	Hash digest
SHA256	`3685724bba10ce01ec17c7a47bb312a9d2bb793a5568d719481f5d12fba1e13c`
MD5	`41e1083d674e2d55435ccdacbb6ec79a`
BLAKE2b-256	`f6fc6d446f8e094bfe74913f46a9a4f9ededcc1c5b5be85d84e538c99b384363`

See more details on using hashes here.

nseekfs 1.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

NSeekFS

Quick Start

Core Features

Exact Search

Batch Queries

Result Layers

Query Options

Provably Optimal Exact Search (Certified)

Certified Guarantee

Assumptions and Contracts

When It Speeds Up

Audit and Replay

Index Persistence

API Reference

Index

Queries

Audit and Replay

Properties

Metric Guide

Architecture Highlights

Installation

Technical Details

Performance Tips

Competitive Benchmarking

Product Positioning

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distributions

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes