Skip to main content

Nearest neighbors search using locality-sensitive hashing

Project description

lshashing

Build Status

python library to perform Locality-Sensitive Hashing to search for nearest neighbors in high dimensional data.

For now it only supports random projections but future versions will support more methods and techniques.

Implementation

pip install lshashing
from lshashing import LSHRandom
import numpy as np

sample_data = np.random.randint(size = (20, 20), low = 0, high = 100)
point = np.random.randint(size = (1,20), low = 0, high = 100)

lshashing = LSHRandom(sample_data, hash_len = 4, num_tables = 2)

print(lshashing.tables[0].hash_table)
#{225: [0, 12],
# 121: [1, 2, 3, 4, 5, 7, 8, 9, 10, 13, 15, 16, 19],
# 81: [6, 11],
# 196: [14],
# 100: [17, 18]}

print(lshashing.knn_search(sample_data, point[0], k = 4, buckets = 3, radius = 2))
#[Neighbor(index=7, distance=159.8217757378512, value=[[78 35 94]...]),
# Neighbor(index=13, distance=174.1551032843999, value=[[86 48 32]...]),
# Neighbor(index=19, distance=174.5737666432159, value=[[53 52 22]...]),
# Neighbor(index=16, distance=180.87564789103038, value=[[81 91 70]...])]

I also made some comparison between lshashing, linear method to get KNNs and scikit-learn's BallTree and KDTree and here are the results.

python examples/lshashing_compare.py

# lshashing module
# Sample data shape:  (10000, 10000)

# query point
# (10000,)

# Start comparison in searching for 4 NNs
# ##### search knn traditionaly
# time to perform:  46.329522132873535

# ##### Search with lshashing package:
# time to construct lsh:  1.1631906032562256
# time to perform:  8.50294828414917

# ##### Now with Scikit Learn
# time to construct ball_tree:  14.541776418685913
# time to perform:  0.1249244213104248

# ##### With sklearn KDTree
# time to construct the tree:  21.14064049720764
# time to perform:  0.1249234676361084

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lshashing-0.0.1.tar.gz (5.0 kB view hashes)

Uploaded Source

Built Distribution

lshashing-0.0.1-py3-none-any.whl (6.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page