Skip to main content

A lightweight library for nearest neighbor search in Rust.

Project description

NilVec Logo

Overview

NilVec is a high-performance, memory-efficient vector search library designed to handle both embeddings and associated metadata without compromising query accuracy or speed. By decoupling metadata from the core embedding data during distance calculations, NilVec ensures that search accuracy remains high while keeping memory overhead minimal.

In our benchmarks, NilVec achieved a 95.5% improvement on query latency compared to leading solutions like Chroma, making it an excellent choice for real-time applications and large-scale search deployments.

Key Features

  • Memory Efficiency: NilVec stores vectors in a contiguous block of memory and tracks metadata separately, avoiding unnecessary duplication and overhead.

  • High Performance: Benchmarked to deliver a 95.5% improvement in query latency over comparable systems, ensuring rapid search responses.

  • Flexible and Ergonomic API: Built in Rust with a Python interface, NilVec supports simple operations for inserting vectors, searching, and bulk index creation—all while handling metadata seamlessly.

How It Works

NilVec separates the embedding components from metadata so that only the core vector elements contribute to distance calculations. Metadata is stored in parallel and associated via a schema that maps attribute names (as Strings) to their corresponding positions in the metadata array. This design guarantees that metadata does not interfere with the accuracy of nearest neighbor searches.

Benchmarks

Our benchmarks compare NilVec with Chroma using the following setup:

  • Configuration:
    • Dimension: 10
    • Number of insertions: 100 vectors
    • Number of queries: 10 queries with metadata filtering
  • Results:
    • NilVec demonstrated a 95.5% improvement on query latency compared to Chroma.
    • Insertion latency is also highly optimized, ensuring minimal overhead during data ingestion.

Below is an excerpt from our benchmark script:

import time
import random
import numpy as np
import nilvec
import chromadb

# Configuration
dim = 10
num_inserts = 100
num_queries = 10
categories = ["news", "blog", "report"]

# --- Chroma Benchmark ---
chroma_query_times = []
for i in range(num_queries):
    query = [random.random() for _ in range(dim)]
    filter_category = random.choice(categories)
    start_time = time.perf_counter()
    # Execute query on Chroma...
    elapsed = time.perf_counter() - start_time
    chroma_query_times.append(elapsed)

# --- NilVec Benchmark ---
nilvec_query_times = []
hnsw = nilvec.PyHNSW(dim, None, None, None, None, "inner_product", ["category"])
for i in range(num_queries):
    query = [random.random() for _ in range(dim)]
    filter_category = random.choice(categories)
    start_time = time.perf_counter()
    results = hnsw.search(query, 5, ("category", filter_category))
    elapsed = time.perf_counter() - start_time
    nilvec_query_times.append(elapsed)

Usage

Installation

NilVec is distributed as a Python package via its PyO3 bindings. You can install it using pip:

pip install nilvec

To build NilVec from source, follow these steps:

  1. Clone the repository:

    git clone https://github.com/cldrake01/nilvec.git
    cd nilvec
    
  2. Build the project:

    matruin develop --release
    

Examples

Below is a quick example of how to use NilVec in your Python project:

import nilvec

# Create an index with dimension 128 using inner product as the metric.
# Optionally, you can provide a schema for metadata.
index = nilvec.PyHNSW(128, None, None, None, None, "inner_product", ["color", "size"])

# Insert a vector with associated metadata.
vector = [0.1] * 128
metadata = [("color", "blue"), ("size", 42)]
index.insert(vector, metadata)

# Perform a search query with metadata filtering.
query = [0.1] * 128
results = index.search(query, k=5, filter=("color", "blue"))
for distance, vector in results:
    print("Distance:", distance, "Vector:", vector)

# Alternatively, bulk-create an index from a list of vectors.
vectors = [
    [0.1] * 128,
    [0.2] * 128,
    [0.3] * 128
]
index.create(vectors)

Testing

To run the NilVec test suite, execute:

cargo test

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nilvec-0.1.7.tar.gz (528.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nilvec-0.1.7-cp312-cp312-macosx_11_0_arm64.whl (346.1 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

File details

Details for the file nilvec-0.1.7.tar.gz.

File metadata

  • Download URL: nilvec-0.1.7.tar.gz
  • Upload date:
  • Size: 528.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.8.2

File hashes

Hashes for nilvec-0.1.7.tar.gz
Algorithm Hash digest
SHA256 69e57aafdfa1e7b2b069dd7f27c33228ddd550292ee596981246eca2e93be6bb
MD5 6a0af929bbbdf8ad4a0dcecbe2a5b617
BLAKE2b-256 06c768f6af8152a4b5f5cfa97ba6a03ded6d716466f9c86f2267e88995b89645

See more details on using hashes here.

File details

Details for the file nilvec-0.1.7-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for nilvec-0.1.7-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 59561dc874204661cfea7e8f6cc90ea42305d1283bd4126ab4c79d0e4c48e16f
MD5 fa003312f55054b5e8424f95f090da85
BLAKE2b-256 0977364fc7c8329e1f548bf499e7e92f16da3582e245670240ca09ae1f2ac74b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page