Skip to main content

High-performance exact vector similarity search with Rust backend

Project description

NSeekFS

PyPI version Python Version License: MIT

High-Performance Vector Similarity Search with Rust Backend

Fast and exact cosine similarity search for Python. Built with Rust for performance, designed for production use.

pip install nseekfs

Quick Start

import nseekfs
import numpy as np

# Create test vectors
embeddings = np.random.randn(10000, 384).astype(np.float32)
query = np.random.randn(384).astype(np.float32)

# Build index and search
index = nseekfs.from_embeddings(embeddings, normalized=True)
results = index.query(query, top_k=10)

# Results is a list of dictionaries
print(f"Found {len(results)} results")
print(f"Best match: vector {results[0]['idx']} (similarity: {results[0]['score']:.3f})")

Core Features

Exact Vector Search

# Basic search
index = nseekfs.from_embeddings(embeddings, normalized=True)
results = index.query(query, top_k=10)

# Access results
for item in results:
    print(f"Vector {item['idx']}: {item['score']:.6f}")

# Index properties
print(f"Index contains {index.rows} vectors x {index.dims} dimensions")

Batch Processing

# Process multiple queries efficiently
queries = np.random.randn(50, 384).astype(np.float32)
batch_results = index.query_batch(queries, top_k=5)

print(f"Processed {len(queries)} queries")
for i, query_results in enumerate(batch_results):
    print(f"Query {i}: {len(query_results)} results")
    # Each query_results is a list of {'idx': int, 'score': float}

Advanced Query Options

# Simple format (default) - returns list of dicts
results = index.query(query, top_k=10, format='simple')

# Detailed format - returns QueryResult object with timing
result_obj = index.query(query, top_k=10, format='detailed')
print(f"Query took {result_obj.query_time_ms:.2f}ms")

# With timing tuple
results, timing = index.query(query, top_k=10, return_timing=True)

Index Persistence

# Load existing index
index = nseekfs.from_bin("my_vectors.nseek")
print(f"Loaded index with {index.rows} vectors x {index.dims} dimensions")

Performance Monitoring

# Get detailed performance metrics
metrics = index.get_performance_metrics()
print(f"Total queries: {metrics['total_queries']}")
print(f"Average time: {metrics['avg_query_time_ms']:.2f}ms")
print(f"SIMD queries: {metrics['simd_queries']}")
print(f"Queries per second: {metrics['queries_per_second']:.0f}")

Built-in Benchmark

# Run performance benchmark
nseekfs.benchmark(vectors=1000, dims=384, queries=100, verbose=True)

API Reference

Index Creation

# Basic usage
index = nseekfs.from_embeddings(
    embeddings,        # numpy array of float32 vectors
    normalized=True,   # normalize vectors (default: False)
    verbose=False      # show progress (default: False)
)

# Load existing index
index = nseekfs.from_bin("path/to/index.nseek")

Query Methods

# Simple query (returns list of dicts)
results = index.query(query_vector, top_k=10)
# Returns: [{'idx': int, 'score': float}, ...]

# Detailed query (returns QueryResult object)
result = index.query_detailed(query_vector, top_k=10)

# Simple query explicitly
results = index.query_simple(query_vector, top_k=10)

# Batch queries
batch_results = index.query_batch(queries_array, top_k=10)
# Returns: List of lists of dicts

Index Properties

print(f"Vectors: {index.rows}")
print(f"Dimensions: {index.dims}")
print(f"Index path: {index.index_path}")
print(f"Config: {index.config}")

Architecture Highlights

SIMD Optimizations

  • AVX2 support for 8x parallelism on compatible CPUs
  • Automatic fallback to scalar operations on older hardware
  • Runtime detection of CPU capabilities

Memory Management

  • Memory mapping for efficient data access
  • Thread-local buffers for zero-allocation queries
  • Cache-aligned data structures for optimal performance

Batch Processing

  • Intelligent batching strategies based on query size
  • SIMD vectorization across multiple queries
  • Optimized memory access patterns

Installation

# From PyPI
pip install nseekfs

# Verify installation
python -c "import nseekfs; print('NSeekFS installed successfully')"

Technical Details

  • Precision: Float32 optimized for standard ML embeddings
  • Memory: Efficient memory usage with optimized data structures
  • Performance: Rust backend with SIMD optimizations where available
  • Compatibility: Python 3.8+ on Windows, macOS, and Linux
  • Thread Safety: Safe concurrent access from multiple threads

Performance Tips

# Pre-normalize vectors if using cosine similarity
embeddings = embeddings / np.linalg.norm(embeddings, axis=1, keepdims=True)
index = nseekfs.from_embeddings(embeddings, normalized=False)

# Use appropriate data types
embeddings = embeddings.astype(np.float32)

# Choose optimal top_k values
results = index.query(query, top_k=10)  # vs top_k=1000

# Use batch processing for multiple queries
batch_results = index.query_batch(queries, top_k=10)

License

MIT License - see LICENSE file for details.


Fast, exact cosine similarity search for Python.

Built with Rust for performance, designed for Python developers.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nseekfs-1.0.0.tar.gz (81.4 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

nseekfs-1.0.0-cp38-abi3-win_amd64.whl (237.1 kB view details)

Uploaded CPython 3.8+Windows x86-64

nseekfs-1.0.0-cp38-abi3-manylinux_2_34_x86_64.whl (357.0 kB view details)

Uploaded CPython 3.8+manylinux: glibc 2.34+ x86-64

nseekfs-1.0.0-cp38-abi3-macosx_11_0_arm64.whl (281.4 kB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

nseekfs-1.0.0-cp38-abi3-macosx_10_12_x86_64.whl (307.2 kB view details)

Uploaded CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file nseekfs-1.0.0.tar.gz.

File metadata

  • Download URL: nseekfs-1.0.0.tar.gz
  • Upload date:
  • Size: 81.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for nseekfs-1.0.0.tar.gz
Algorithm Hash digest
SHA256 1dab000309adb212fc1317b714a6413e7668385ae602d54f020f3f0b2c117423
MD5 4c03ee9271e8070dab82d58fd9607313
BLAKE2b-256 a7e0e85dc02236aba46fefd7288c2b2030e051d6b14ec4fd0c3c75f20103257a

See more details on using hashes here.

File details

Details for the file nseekfs-1.0.0-cp38-abi3-win_amd64.whl.

File metadata

  • Download URL: nseekfs-1.0.0-cp38-abi3-win_amd64.whl
  • Upload date:
  • Size: 237.1 kB
  • Tags: CPython 3.8+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for nseekfs-1.0.0-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 175aeaac6cb48341289e0a4d3639e2ab75f6627ce898a30bc686b69e31d6e8f6
MD5 08fb1beddc76fa8af480e9103a79f962
BLAKE2b-256 e030ed4a2ff50c30a4d91a5a52f7aac44d6eb8f575c7cc934be4055489b4d6bc

See more details on using hashes here.

File details

Details for the file nseekfs-1.0.0-cp38-abi3-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for nseekfs-1.0.0-cp38-abi3-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 d38d94471ba1943604ea6a5b601f2ffba130f827eef9495def37cbd3d5c53bb2
MD5 daeb64fddf6ac7d4940d65c25a81ba3e
BLAKE2b-256 c805575f55d9c07a91a180900c04eaa13730acac6050fe3cf865612f29ba4448

See more details on using hashes here.

File details

Details for the file nseekfs-1.0.0-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for nseekfs-1.0.0-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 74658a13df4e7cf5d110b1697917c33b7030dde200c064a737225051cb442c38
MD5 4c2b6d658dc429000fa3b75f9f1812be
BLAKE2b-256 b29cde4ff51d98075ad61489f3861b2f8e4575750cd7f21755dcc37a6aead17f

See more details on using hashes here.

File details

Details for the file nseekfs-1.0.0-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for nseekfs-1.0.0-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 65d6cbad984f478dfcc6be14eb524cfc546d7221d49b69b05c112b8e4b30901e
MD5 813e802c40c429f0c13e31d802e7126b
BLAKE2b-256 1f40273be01da99918d88c7ab8c1192d3ef4584fdf31e3081e2a8ed345b2bc48

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page