Skip to main content

Proxi: Accelerating nearest-neighbor search for high-dimensional data!

Project description

Proxiss: Fast Vector Similarity Search

License

Proxiss is a high-performance C++ library with Python bindings, designed for fast vector similarity search in high-dimensional data. It provides efficient nearest-neighbor search capabilities for applications like semantic search, recommendation systems, and machine learning, currently optimized for Linux environments.

Key Features

  • High Performance: Optimized C++ implementation with OpenMP parallelization for fast k-NN searches
  • Multiple Distance Metrics: Supports common distance functions:
    • Euclidean (L2)
    • Manhattan (L1)
    • Cosine Similarity
  • Two Search Modes:
    • ProxiFlat: Vector-only indexing for pure similarity search
    • ProxiKNN: Classification-focused search with label storage
  • Python Integration: Clean Python API powered by pybind11
  • Batched Operations: Efficient batch processing for multiple queries
  • Automatic Dependencies: CMake automatically downloads and configures required dependencies
  • Lightweight Design: Focused on core vector search functionality

Why Proxiss?

Vector similarity search is fundamental to many modern applications, but traditional methods can be slow and resource-intensive. Proxiss addresses this by:

  • Providing optimized C++ implementations with parallel processing
  • Offering clean, simple APIs that hide implementation complexity
  • Focusing on core functionality without unnecessary overhead
  • Supporting both pure vector search and classification use cases

Installation

Proxiss builds from source with automatic dependency management.

Prerequisites

  • Linux environment (Ubuntu, Debian, CentOS, etc.)
  • Python 3.10 or higher
  • CMake 3.16 or higher
  • UV package manager

Note: The build system automatically installs clang++, OpenMP, and pybind11 if not found.

Building from Source

  1. Clone the repository:

    git clone https://github.com/BiradarSiddhant02/Proxiss.git
    cd Proxiss
    
  2. Install UV (if not already installed):

    curl -LsSf https://astral.sh/uv/install.sh | sh
    
  3. Create virtual environment and install:

    uv venv
    source .venv/bin/activate
    uv pip install . -v
    

Quick Start

ProxiFlat: Vector Similarity Search

from proxiss import ProxiFlat
import numpy as np

# Sample data
embeddings = np.array([
    [0.0, 0.0],
    [1.0, 1.0], 
    [2.0, 2.0],
    [3.0, 3.0]
], dtype=np.float32)

# Initialize ProxiFlat
px = ProxiFlat(k=2, num_threads=2, objective_function="l2")

# Index your vectors
px.index_data(embeddings)

# Query for nearest neighbors
query = np.array([1.5, 1.5], dtype=np.float32)
indices = px.find_indices(query)
print(f"Nearest neighbor indices: {indices}")

# Batch queries
queries = np.array([[0.5, 0.5], [2.5, 2.5]], dtype=np.float32)
batch_indices = px.find_indices_batched(queries)
print(f"Batch results: {batch_indices}")

# Save and load index
px.save_state("index.bin")
px_loaded = ProxiFlat(k=2, num_threads=2, objective_function="l2")
px_loaded.load_state("index.bin")

ProxiKNN: Classification Search

from proxiss import ProxiKNN
import numpy as np

# Sample data with labels
features = np.array([
    [0.0, 0.0], [1.0, 1.0],
    [5.0, 5.0], [6.0, 6.0]
], dtype=np.float32)
labels = np.array([0, 0, 1, 1], dtype=np.float32)

# Initialize and train
knn = ProxiKNN(n_neighbours=2, n_jobs=2, distance_function="l2")
knn.fit(features, labels)

# Predict
query = np.array([0.5, 0.5], dtype=np.float32)
prediction = knn.predict([query])
print(f"Predicted class: {prediction}")

# Save and load model
knn.save_state("model_dir")
knn_loaded = ProxiKNN(n_neighbours=2, n_jobs=2, distance_function="l2")
knn_loaded.load_state("model_dir")

Benchmarking

Proxiss includes benchmarking scripts to evaluate performance.

1. Generate Test Data

Create synthetic datasets for benchmarking:

python scripts/make_data.py --N 10000 --D 128 --X_path scripts/X.npy

2. Benchmark ProxiFlat

Test vector similarity search performance:

python scripts/bench_proxiss_flat.py --X_path scripts/X.npy -k 5 --threads 4 --objective l2

3. Benchmark ProxiKNN

Test classification performance:

python scripts/bench_proxiss_knn.py --X_path scripts/X.npy -k 5 --threads 4 --objective l2

4. Compare with FAISS

Install FAISS and compare performance:

uv pip install faiss-cpu
python scripts/bench_faiss.py --X_path scripts/X.npy -k 5 --threads 4 --objective l2

5. Compare with scikit-learn

Install scikit-learn and compare KNN classification performance:

uv pip install scikit-learn
python scripts/bench_sklearn_knn.py --X_path scripts/X.npy -k 5 --threads 4 --objective l2

Example Usage

Interactive Inference

The examples/inference.py script demonstrates similarity search on real embeddings:

python examples/inference.py --embeddings examples/embeddings.npy --words examples/words.npy -k 5

This script loads pre-computed embeddings and allows interactive similarity search.

Development

Project Structure

  • Core C++ Implementation:

    • src/proxi_flat.cc, include/proxi_flat.h - Vector similarity search
    • src/proxi_knn.cc, include/proxi_knn.h - KNN classification
    • src/priority_queue.cc, include/priority_queue.h - Custom priority queue
    • include/distance.hpp - Distance function implementations
  • Python Bindings:

    • bindings/proxi_flat_binding.cc - ProxiFlat Python interface
    • bindings/proxi_knn_binding.cc - ProxiKNN Python interface
    • proxiss/ProxiFlat.py - Python wrapper for ProxiFlat
    • proxiss/ProxiKNN.py - Python wrapper for ProxiKNN
  • Build System:

    • CMakeLists.txt - C++ build configuration with automatic dependencies
    • pyproject.toml - Python package configuration

Running Tests

# Install test dependencies
uv pip install pytest

# Run all tests
python -m pytest tests/ -v

# Run specific tests
python -m pytest tests/test_proxi_flat.py -v
python -m pytest tests/test_proxi_knn.py -v

Building for Development

# Set up development environment
uv venv
source .venv/bin/activate

# Install development dependencies
uv pip install -r requirements.txt

# Reinstall after C++ changes
uv pip install -e . --force-reinstall --no-deps

API Reference

ProxiFlat Methods

  • __init__(k, num_threads, objective_function) - Initialize index
  • index_data(embeddings) - Index vector data
  • find_indices(query) - Find nearest neighbor indices
  • find_indices_batched(queries) - Batch query processing
  • save_state(filepath) - Save index to file
  • load_state(filepath) - Load index from file

ProxiKNN Methods

  • __init__(n_neighbours, n_jobs, distance_function) - Initialize classifier
  • fit(features, labels) - Train on labeled data
  • predict(features) - Predict class labels
  • save_state(directory) - Save model to directory
  • load_state(directory) - Load model from directory

License

Proxiss is licensed under the Apache License, Version 2.0. See LICENSE.txt for details.

Contributing

Contributions are welcome! Please feel free to submit issues, feature requests, or pull requests.


Proxiss - Fast Vector Similarity Search

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

proxiss-0.3.0-cp314-cp314t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (886.0 kB view details)

Uploaded CPython 3.14tmanylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

proxiss-0.3.0-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (875.4 kB view details)

Uploaded CPython 3.14manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

proxiss-0.3.0-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (875.1 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

proxiss-0.3.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (875.0 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

proxiss-0.3.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (870.6 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

proxiss-0.3.0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (864.0 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

proxiss-0.3.0-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (864.2 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

File details

Details for the file proxiss-0.3.0-cp314-cp314t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for proxiss-0.3.0-cp314-cp314t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 f9a8615cfc3ff3ef456c0377573051a2084bb63cd336018207e16a1393518f58
MD5 6bc42e620a6c509c7073b703ee576dba
BLAKE2b-256 0672b57a3891b8ff924faa269b2cfb9fc20e667f2f56c7b8be7286e008c7866f

See more details on using hashes here.

File details

Details for the file proxiss-0.3.0-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for proxiss-0.3.0-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 aa47826da037645ea05d61b54c7a762097075c7992ea9203e797cb0bb94285b2
MD5 fb355e2a61f4aa3ddbbf022cd3f6614f
BLAKE2b-256 6275a2dbf1f0b46c59cb360c7ee2f89aec30add16b90ded2b37c32925ffde3d2

See more details on using hashes here.

File details

Details for the file proxiss-0.3.0-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for proxiss-0.3.0-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 755f0cb4051ac0c53edd519c0b3bb75d4682f76baab998176ca090e5ed9825f0
MD5 40f3d53c64d299d8e17756ddca7ef169
BLAKE2b-256 8d980bc4126f8eb83abcad200f84a55167a3709c5cef3147c891ecced2caa7a8

See more details on using hashes here.

File details

Details for the file proxiss-0.3.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for proxiss-0.3.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 f0095d59befa028efed3fc404721e606011eb7a17f7fc5b7f10241b5a24b30e7
MD5 42549b968464aad33ea80c698750bcdf
BLAKE2b-256 d616e4c5017faf749baa77e9f72fddce0e5070b5c349d0e7465b779f8e560471

See more details on using hashes here.

File details

Details for the file proxiss-0.3.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for proxiss-0.3.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 d4b98f9e4e2b95e63ddab220cf447c519bd115cd1f4ca96fd297108708d7767e
MD5 d2055e4f2e784023d37c5f8f6dd4b8b8
BLAKE2b-256 6659b255e91cffd1949c7b56cea23eb8a3da0b2424bf34ba99d031364f91bcef

See more details on using hashes here.

File details

Details for the file proxiss-0.3.0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for proxiss-0.3.0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 239c473cd1b193de06ed179905193a48dc090358ca69ba421bb9f77151ec6da8
MD5 406c3c6e1ee27393d3f7d0caf785e86e
BLAKE2b-256 a2b16b34bc297536028cd35581cd6bf75ec698085be4c9beef4bed468d5fa376

See more details on using hashes here.

File details

Details for the file proxiss-0.3.0-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for proxiss-0.3.0-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 7603ee43368d10760eedd4c084f28796654a6dabd98046ee8c5add7696cc9d0f
MD5 15658d056f2dcd72cdf4533b0687c0ca
BLAKE2b-256 e95f6bc594460bce96969f5091bcc165127c92ba52605b690ba2fe6099a3e2b0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page