Skip to main content

High-performance exact vector similarity search with Rust backend

Project description

NSeekFS

PyPI version
Python Version
License: MIT

High-Performance Exact Vector Search with Rust Backend

Fast and exact vector similarity search for Python. Built with Rust for performance, designed for production use.


NSeekFS combines the safety and performance of Rust with a clean Python API.
This release supports exact vector search with multiple similarity metrics:

  • cosine (requires normalized vectors)
  • dot
  • euclidean

Upcoming releases will expand support to:

  • Approximate Nearest Neighbor (ANN) search
  • Additional precision levels and memory optimizations

Our goal: deliver a fast, reliable, and production-ready search engine that evolves with your needs.

pip install nseekfs

Quick Start

import nseekfs
import numpy as np

# Create some test vectors
embeddings = np.random.randn(10000, 384).astype(np.float32)
query = np.random.randn(384).astype(np.float32)

# Choose metric: "cosine", "dot", or "euclidean"
metric = "cosine"

# Normalize only if using cosine
if metric == "cosine":
    embeddings = embeddings / np.linalg.norm(embeddings, axis=1, keepdims=True)
    query = query / np.linalg.norm(query)

# Build index and run a search
index = nseekfs.from_embeddings(embeddings, metric=metric, normalized=True)
results = index.query(query, top_k=10)

print(f"Found {len(results)} results")
print(f"Best match: idx={results[0]['idx']} score={results[0]['score']:.3f}")

Core Features

Exact Search

# Simple query
results = index.query(query, top_k=10)

# Access results
for item in results:
    print(f"Vector {item['idx']}: {item['score']:.6f}")

Batch Queries

queries = np.random.randn(50, 384).astype(np.float32)
if metric == "cosine":
    queries = queries / np.linalg.norm(queries, axis=1, keepdims=True)

batch_results = index.query_batch(queries, top_k=5)
print(f"Processed {len(batch_results)} queries")

Query Options

# Simple query
results = index.query_simple(query, top_k=10)

# Detailed query
result = index.query_detailed(query, top_k=10)
print(f"Query took {result.query_time_ms:.2f} ms, top1 idx={result.results[0]['idx']}")

Index Persistence

# Build and save index
index = nseekfs.from_embeddings(embeddings, metric=metric, normalized=True)
print("Index saved at:", index.index_path)

# Later, reload from file
index2 = nseekfs.from_bin(index.index_path)
print(f"Reloaded index: {index2.rows} vectors x {index2.dims} dims")

API Reference

Index

  • from_embeddings(embeddings, metric="cosine", normalized=True, verbose=False)
  • from_bin(path)

Queries

  • query(query_vector, top_k=10)
  • query_simple(query_vector, top_k=10)
  • query_detailed(query_vector, top_k=10)
  • query_batch(queries, top_k=10)

Properties

  • index.rows
  • index.dims
  • index.config

Metric Guide

Metric Normalization required Typical use case
cosine Yes Semantic embeddings (e.g. sentence-transformers, OpenAI)
dot No Raw model outputs where scale carries meaning
euclidean No Geometric distance or clustering tasks

Architecture Highlights

  • Similarity Metrics: cosine (with enforced normalization), dot product, and Euclidean (−L2²).
  • Batch Query Engine: adaptive selection between full matrix GEMM, chunked streaming, and parallel Rayon fallback.
  • SIMD Acceleration: custom wide::f32x8 kernels for dot/L2, with matrixmultiply::sgemm for block GEMM.
  • Memory Mapping: compact binary format (header + float data) using memmap2 for zero-copy loading.
  • Streaming Index Build: chunked writes, runtime memory estimation, safe normalization, and atomic file finalization.
  • Cross-Platform Optimizations: runtime SIMD detection (AVX2/AVX/SSE4.2/NEON) and environment-controlled tuning (NSEEK_THREADS, NSEEK_QBLOCK, NSEEK_DBLOCK).

Installation

# From PyPI
pip install nseekfs

# Verify installation
python -c "import nseekfs; print('NSeekFS installed successfully')"

Technical Details

  • Similarity Metrics: Cosine (with enforced normalization), Dot Product, Euclidean (−L2²).
  • Precision: Float32 core; future-ready hooks for f16, f8, f64 levels.
  • Index Format: Compact binary (12-byte header + contiguous float data), memory-mapped via memmap2.
  • Batch Queries: Adaptive engine with GEMM (matrixmultiply), SIMD kernels (wide::f32x8), or parallel Rayon fallback.
  • Memory: Streaming index build with chunking, safe normalization, and runtime memory estimation.
  • Performance: AVX2/AVX/SSE4.2/NEON runtime detection; env vars (NSEEK_THREADS, NSEEK_QBLOCK, NSEEK_DBLOCK) for tuning.
  • Thread Safety: Parallel query execution via Rayon; thread-safe index loading.
  • Compatibility: Python 3.8+ on Windows, macOS, Linux.

Performance Tips

# Cosine similarity: normalize embeddings and queries
embeddings = embeddings / np.linalg.norm(embeddings, axis=1, keepdims=True)
index = nseekfs.from_embeddings(embeddings, metric="cosine", normalized=True)

# Dot or Euclidean: use raw embeddings, no normalization
index = nseekfs.from_embeddings(embeddings, metric="dot", normalized=False)

License

MIT License - see LICENSE file for details.


Fast, exact vector similarity search for Python.

Built with Rust for performance, designed for Python developers.
Source: github.com/NSeek-AI/nseekfs

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nseekfs-1.0.4.tar.gz (79.4 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

nseekfs-1.0.4-cp38-abi3-win_amd64.whl (227.2 kB view details)

Uploaded CPython 3.8+Windows x86-64

nseekfs-1.0.4-cp38-abi3-manylinux_2_34_x86_64.whl (347.9 kB view details)

Uploaded CPython 3.8+manylinux: glibc 2.34+ x86-64

nseekfs-1.0.4-cp38-abi3-macosx_11_0_arm64.whl (279.5 kB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

nseekfs-1.0.4-cp38-abi3-macosx_10_12_x86_64.whl (299.6 kB view details)

Uploaded CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file nseekfs-1.0.4.tar.gz.

File metadata

  • Download URL: nseekfs-1.0.4.tar.gz
  • Upload date:
  • Size: 79.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for nseekfs-1.0.4.tar.gz
Algorithm Hash digest
SHA256 33e819edf64bf6a78e09a765eed9913edebc48c4b27751d83b640c19f88f88ea
MD5 a1ea6559669f250cbe780bfb974f4734
BLAKE2b-256 8110d70cf362854ca4283467f0df508d23d2567cda3fcf2b916791c63fd7307f

See more details on using hashes here.

File details

Details for the file nseekfs-1.0.4-cp38-abi3-win_amd64.whl.

File metadata

  • Download URL: nseekfs-1.0.4-cp38-abi3-win_amd64.whl
  • Upload date:
  • Size: 227.2 kB
  • Tags: CPython 3.8+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for nseekfs-1.0.4-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 e5848dd1356ed90ff70fb6582f2ef91bca37fbe88be195b93291dd7fe3ca7c3e
MD5 53195bd9af0549ce720162e96478318c
BLAKE2b-256 44c6122a8e1e4fd6cfbb11625c7b706df7d80345013721cca1fd277e89eda7b4

See more details on using hashes here.

File details

Details for the file nseekfs-1.0.4-cp38-abi3-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for nseekfs-1.0.4-cp38-abi3-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 f347eba0786fd3cc66a5a2a207492d11453de68a24bca6b5257a9457ec193d6d
MD5 b7e247aabbaf251eb353bd8bd9ab28ae
BLAKE2b-256 20e5da159cc6c26b5a42585f5450a641619df862592ffd49ae2d9ce9d8379bb7

See more details on using hashes here.

File details

Details for the file nseekfs-1.0.4-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for nseekfs-1.0.4-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 4bac2596f1d94057410c7d5b0ccb6661d5ffa247b265b2558441fab1dc41e252
MD5 c6505e9ca0c1ea222461563f223870f7
BLAKE2b-256 61a8866ab9e0fe6825c77ead927a6a7e7c7d57eb3b9eb424c0d1b45b84c213bd

See more details on using hashes here.

File details

Details for the file nseekfs-1.0.4-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for nseekfs-1.0.4-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 db055eeb420ca15e29afe69e8fd534926e5e626433ce50706b66a827d96fbc19
MD5 d6197a636fd91933648549264cf8dce6
BLAKE2b-256 fb84881226666d4374da217226d245e919da8c8f5d61e1f852d1f43a50cea406

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page