Skip to main content

A lightweight library for nearest neighbor search in Rust.

Project description

NilVec Logo

Overview

NilVec is a high-performance, memory-efficient vector search library designed to handle both embeddings and associated metadata without compromising query accuracy or speed. By decoupling metadata from the core embedding data during distance calculations, NilVec ensures that search accuracy remains high while keeping memory overhead minimal.

In our benchmarks, NilVec achieved a 95.5% improvement on query latency compared to leading solutions like Chroma, making it an excellent choice for real-time applications and large-scale search deployments.

Key Features

  • Memory Efficiency: NilVec stores vectors in a contiguous block of memory and tracks metadata separately, avoiding unnecessary duplication and overhead.

  • High Performance: Benchmarked to deliver a 95.5% improvement in query latency over comparable systems, ensuring rapid search responses.

  • Flexible and Ergonomic API: Built in Rust with a Python interface, NilVec supports simple operations for inserting vectors, searching, and bulk index creation—all while handling metadata seamlessly.

How It Works

NilVec separates the embedding components from metadata so that only the core vector elements contribute to distance calculations. Metadata is stored in parallel and associated via a schema that maps attribute names (as Strings) to their corresponding positions in the metadata array. This design guarantees that metadata does not interfere with the accuracy of nearest neighbor searches.

Benchmarks

Our benchmarks compare NilVec with Chroma using the following setup:

  • Configuration:
    • Dimension: 10
    • Number of insertions: 100 vectors
    • Number of queries: 10 queries with metadata filtering
  • Results:
    • NilVec demonstrated a 95.5% improvement on query latency compared to Chroma.
    • Insertion latency is also highly optimized, ensuring minimal overhead during data ingestion.

Below is an excerpt from our benchmark script:

import time
import random
import numpy as np
import nilvec
import chromadb

# Configuration
dim = 10
num_inserts = 100
num_queries = 10
categories = ["news", "blog", "report"]

# --- Chroma Benchmark ---
chroma_query_times = []
for i in range(num_queries):
    query = [random.random() for _ in range(dim)]
    filter_category = random.choice(categories)
    start_time = time.perf_counter()
    # Execute query on Chroma...
    elapsed = time.perf_counter() - start_time
    chroma_query_times.append(elapsed)

# --- NilVec Benchmark ---
nilvec_query_times = []
hnsw = nilvec.PyHNSW(dim, None, None, None, None, "inner_product", ["category"])
for i in range(num_queries):
    query = [random.random() for _ in range(dim)]
    filter_category = random.choice(categories)
    start_time = time.perf_counter()
    results = hnsw.search(query, 5, ("category", filter_category))
    elapsed = time.perf_counter() - start_time
    nilvec_query_times.append(elapsed)

Usage

Installation

NilVec is distributed as a Python package via its PyO3 bindings. You can install it using pip:

pip install nilvec

Examples

Below is a quick example of how to use NilVec in your Python project:

import nilvec

# Create an index with dimension 128 using inner product as the metric.
# Optionally, you can provide a schema for metadata.
index = nilvec.PyHNSW(128, None, None, None, None, "inner_product", ["color", "size"])

# Insert a vector with associated metadata.
vector = [0.1] * 128
metadata = [("color", "blue"), ("size", 42)]
index.insert(vector, metadata)

# Perform a search query with metadata filtering.
query = [0.1] * 128
results = index.search(query, k=5, filter=("color", "blue"))
for distance, vector in results:
    print("Distance:", distance, "Vector:", vector)

# Alternatively, bulk-create an index from a list of vectors.
vectors = [
    [0.1] * 128,
    [0.2] * 128,
    [0.3] * 128
]
index.create(vectors)

Testing

To run the NilVec test suite, execute:

cargo test

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nilvec-0.1.5.tar.gz (491.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nilvec-0.1.5-cp312-cp312-macosx_11_0_arm64.whl (287.3 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

File details

Details for the file nilvec-0.1.5.tar.gz.

File metadata

  • Download URL: nilvec-0.1.5.tar.gz
  • Upload date:
  • Size: 491.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.8.2

File hashes

Hashes for nilvec-0.1.5.tar.gz
Algorithm Hash digest
SHA256 17965dd83f7488c90b603ffdeb650e5ce4901d471738454d1aa7508018cb7257
MD5 3991ed510239e99517b5edfecaed2e85
BLAKE2b-256 007c680c7b198baf7ebed3e3b8c4562708e2c8cf4b6828386955f069048daedf

See more details on using hashes here.

File details

Details for the file nilvec-0.1.5-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for nilvec-0.1.5-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 97744158fe4be852d90400ee9adc11d56a3992e56a67430e0f9072ec42d10983
MD5 171f3b6c9e2c7a3a4b4dde547c3728ab
BLAKE2b-256 21de8b6e056007bb742bb8f463d70f0060cf1275363d4d66873f854fa543fc67

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page