Skip to main content

High-performance exact vector similarity search with Rust backend

Project description

NSeekFS

PyPI version Python Version License: MIT

High-Performance Exact Vector Search with Rust Backend

Fast and exact cosine similarity search for Python. Built with Rust for performance, designed for production use.


NSeekFS combines the safety and performance of Rust with a clean Python API.
This first release focuses on exact cosine search with SIMD acceleration, providing predictable and reproducible results for ML workloads.

Upcoming releases will expand support to:

  • Euclidean distance
  • Approximate Nearest Neighbor (ANN) search
  • Additional precision levels and memory optimizations

Our goal: deliver a fast, reliable, and production-ready search engine that evolves with your needs.

pip install nseekfs

Quick Start

import nseekfs
import numpy as np

# Create some test vectors
embeddings = np.random.randn(10000, 384).astype(np.float32)
query = np.random.randn(384).astype(np.float32)

# Normalize embeddings and query
embeddings = embeddings / np.linalg.norm(embeddings, axis=1, keepdims=True)
query = query / np.linalg.norm(query)

# Build index and run a search
#By default, from_embeddings assumes vectors normalized (normalized=True). 
#If your vectors are not normalized, set normalized=False and NSeekFS will handle it internally
index = nseekfs.from_embeddings(embeddings, normalized=True)
results = index.query(query, top_k=10)

print(f"Found {len(results)} results")
print(f"Best match: idx={results[0]['idx']} score={results[0]['score']:.3f}")

Core Features

Exact Search

# Basic query
results = index.query(query, top_k=10)

# Access results
for item in results:
    print(f"Vector {item['idx']}: {item['score']:.6f}")

Batch Queries

queries = np.random.randn(50, 384).astype(np.float32)
batch_results = index.query_batch(queries, top_k=5)
print(f"Processed {len(batch_results)} queries")

Query Options

# Simple query (alias for query with format="simple")
results = index.query_simple(query, top_k=10)

# Detailed query with timing and diagnostics
result = index.query_detailed(query, top_k=10)
print(f"Query took {result.query_time_ms:.2f} ms, top1 idx={result.results[0]['idx']}")

Index Persistence

# Build and save index
index = nseekfs.from_embeddings(embeddings, normalized=True)
print("Index saved at:", index.index_path)

# Later, reload from file
index2 = nseekfs.from_bin(index.index_path)
print(f"Reloaded index: {index2.rows} vectors x {index2.dims} dims")

## API Reference

### Index

* `from_embeddings(embeddings, normalized=True, verbose=False)`
* `from_bin(path)`

### Queries

* `query(query_vector, top_k=10)`
* `query_simple(query_vector, top_k=10)`
* `query_detailed(query_vector, top_k=10)`
* `query_batch(queries, top_k=10)`

### Properties

* `index.rows`
* `index.dims`
* `index.config`

## Architecture Highlights

### SIMD Optimizations
- AVX2 support for 8x parallelism on compatible CPUs
- Automatic fallback to scalar operations on older hardware  
- Runtime detection of CPU capabilities

### Memory Management
- Memory mapping for efficient data access
- Thread-local buffers for zero-allocation queries
- Cache-aligned data structures for optimal performance

### Batch Processing
- Intelligent batching strategies based on query size
- SIMD vectorization across multiple queries
- Optimized memory access patterns

## Installation

```bash
# From PyPI
pip install nseekfs

# Verify installation
python -c "import nseekfs; print('NSeekFS installed successfully')"

Technical Details

  • Precision: Float32 optimized for standard ML embeddings
  • Memory: Efficient memory usage with optimized data structures
  • Performance: Rust backend with SIMD optimizations where available
  • Compatibility: Python 3.8+ on Windows, macOS, and Linux
  • Thread Safety: Safe concurrent access from multiple threads

Performance Tips

# Pre-normalize vectors if using cosine similarity
embeddings = embeddings / np.linalg.norm(embeddings, axis=1, keepdims=True)
index = nseekfs.from_embeddings(embeddings, normalized=True)

# Use appropriate data types
embeddings = embeddings.astype(np.float32)

# Choose optimal top_k values
results = index.query(query, top_k=10)  # vs top_k=1000

# Use batch processing for multiple queries
batch_results = index.query_batch(queries, top_k=10)

License

MIT License - see LICENSE file for details.


Fast, exact cosine similarity search for Python.

Built with Rust for performance, designed for Python developers. Source: github.com/NSeek-AI/nseekfs

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nseekfs-1.0.3.tar.gz (80.1 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

nseekfs-1.0.3-cp38-abi3-win_amd64.whl (226.6 kB view details)

Uploaded CPython 3.8+Windows x86-64

nseekfs-1.0.3-cp38-abi3-manylinux_2_34_x86_64.whl (347.6 kB view details)

Uploaded CPython 3.8+manylinux: glibc 2.34+ x86-64

nseekfs-1.0.3-cp38-abi3-macosx_11_0_arm64.whl (274.2 kB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

nseekfs-1.0.3-cp38-abi3-macosx_10_12_x86_64.whl (297.8 kB view details)

Uploaded CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file nseekfs-1.0.3.tar.gz.

File metadata

  • Download URL: nseekfs-1.0.3.tar.gz
  • Upload date:
  • Size: 80.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for nseekfs-1.0.3.tar.gz
Algorithm Hash digest
SHA256 06b132757eb6334d3a180f7da3a4a3d22ec89181262253ce245dab0ecb960409
MD5 52cebf006043219313114cf13ffecd9f
BLAKE2b-256 df9ae6f7998a3a2608f9fd92d6705948edb8a709332080c5aa8cd43378c75627

See more details on using hashes here.

File details

Details for the file nseekfs-1.0.3-cp38-abi3-win_amd64.whl.

File metadata

  • Download URL: nseekfs-1.0.3-cp38-abi3-win_amd64.whl
  • Upload date:
  • Size: 226.6 kB
  • Tags: CPython 3.8+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for nseekfs-1.0.3-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 28af5cc658d2acfa4f5940a1790b3edecc4779b7901a8701c2cf008ef2c0d114
MD5 190e846308fa374ebe121aa9cdb6f47b
BLAKE2b-256 8c910150e74819bfb8fc170bb65d3ac40f33dda1f6a38cdd81a29938bed367d7

See more details on using hashes here.

File details

Details for the file nseekfs-1.0.3-cp38-abi3-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for nseekfs-1.0.3-cp38-abi3-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 bafbfa8c6b98eaf589e0346eaa4128817ae9ef3c0b3de8b54d3a3656f5f4fc63
MD5 b01f00b435a7514f3acc30283530396c
BLAKE2b-256 72145087c6e5f9d83a9246fad143878767ddb1ea16f9c5c78f38d4fe5c390705

See more details on using hashes here.

File details

Details for the file nseekfs-1.0.3-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for nseekfs-1.0.3-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 5a20846372d51abae2daf21c3ef6dff3ded1bbf6996a4e70560545b742565e0d
MD5 8d63e0c713fc73817c8418eb9f3cf853
BLAKE2b-256 3de3d3dd9b46cfb981c635b12b7564055fe6e925008c2826aff8bb64fdf749bb

See more details on using hashes here.

File details

Details for the file nseekfs-1.0.3-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for nseekfs-1.0.3-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 f9613a492d736fc1ea2b32f7c78046f11425253f1392b392a91b076363d625af
MD5 72e55625f9dd7410e050a57a7669f5a3
BLAKE2b-256 d61aaa8a922afa07d5a8ce86f7930b6d047b86b656bb128e47098e33f32b4130

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page