Skip to main content

A lightweight fast exact radius query algorithm

Project description

SNN: Fast and exact fixed-radius neighbor search

C/C++ CI !pypi License: MIT DOI

SNN is a fast and exact fixed-radius nearest neighbor search algorithm [1]. It uses the first principal component of the data to prune the search space and speeds up Euclidean distance computations using high-level BLAS routines. SNN is implemented in native Python. On many problems, SNN is faster than KDtree and Balltree in the scikit-learn package. There is also a C++ implementation of SNN.

Reproducibility

To reproduce the experiments from the paper [1], see the instructions and code in the exp subfolder.

Python installation

The native Python implementation of SNN can be installed by:

pip install snnpy

Usage

import numpy as np
from snnpy import *
from time import time

n_samples = 100000
n_dim =  100
radius = 3.5
rng = np.random.RandomState(0)
X = rng.random_sample((n_samples, n_dim))  

# build SNN model
st = time()
snn_model = build_snn_model(X)  
print("SNN index time:", time()-st)
# will be faster if return_dist is False, then no distance information come out

# query neighbors of X[0]
st = time()
ind,dist = snn_model.query_radius(X[0], radius, return_distance=True)
# If remove the returning of the associated distance, use: ind, dist = snn_model.query_radius(X[0], radius, return_distance=False)
sort_ind = np.argsort(dist)
print("SNN query time:", time()-st)

# print total number and top five indices
print("number of neighbors:", len(ind))
print("indices of closest five:", ", ".join([str(i) for i in ind[sort_ind][:5]]))

# EXAMPLE OUTPUT
# SNN index time: 0.2224433422088623
# SNN query time: 0.009207725524902344
# number of neighbors: 550
# indices of closest five: 0, 27279, 69983, 65906, 97095

Compare this to sklearn's KDTree:

from sklearn.neighbors import KDTree
st = time()
tree = KDTree(X)    
print("KDTree index time:", time()-st)
st = time()
ind2 = tree.query_radius(X[0].reshape(1, -1), radius)
print("KDTree query time:", time()-st)
print("number of neighbors:", len(ind2[0]))

# KDTree index time: 7.597502946853638
# KDTree query time: 0.08962678909301758
# number of neighbors: 550

We also support multi-point queries which exploit single-threaded BLAS-3 (multi-threading is under development):

ind = snn_model.radius_batch_query(X[:10], radius) 

snn_model.radius_batch_query uses Numba for further speedup. For the first run, Numba has to go through the code and optimize it, adding extra overhead. Each subsequent run of batch queries will be much faster.

License

All the content in this repository is licensed under the MIT License.

Reference

Chen X, Güttel S. 2024. Fast and exact fixed-radius neighbor search based on sorting. PeerJ Computer Science 10:e1929 https://doi.org/10.7717/peerj-cs.1929


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

snnpy-0.0.8.tar.gz (8.8 kB view details)

Uploaded Source

File details

Details for the file snnpy-0.0.8.tar.gz.

File metadata

  • Download URL: snnpy-0.0.8.tar.gz
  • Upload date:
  • Size: 8.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for snnpy-0.0.8.tar.gz
Algorithm Hash digest
SHA256 0159b8b8254b334c6fcf1c486a10336cd03fe86181cd383689857232556e0cde
MD5 79f82291467d7533ecb750eea3e0285c
BLAKE2b-256 7f5a3a95c549a23e557a7094c94f07d283abb5bc6bf2b4105ad58a700295b5a6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page