Skip to main content

A 1&2-stable Locality-Sensitive Hashing implementation in Python

Project description

P-stable-LSH

The package is one implementation of paper Locality-Sensitive Hashing Scheme Based on p-Stable Distributions in SCG’2014. P-stable-lsh a novel Locality-Sensitive Hashing scheme for the Approximate Nearest Neighbor Problem under $L_p$ norm, based on p-stable distributions.

Note: This code is used as the practice of the paper, and there are few optimizations. Sharing is for communication and learning. If it is a high-performance scenario, please optimize as appropriate.

Installation

pip install p-stable-lsh-python

Usage

Example

The following example shows all features of the package, If you want to know the details, please refer to the source code and comments.

import numpy as np
import p_stable_lsh.pstable as psl

dim = 100 # vector dimension
data = [np.random.random(dim) for _ in range(2)] # generate two vectors

r = 50.0 # the parameter $r$ in paper

m1 = psl.pstable(r, dim, metric_dim=1)
m1.lsh(data[0])
m2 = psl.pstable(r, dim, metric_dim=1)
m2.lsh(data[1])
print(m1.md(m2)) # estimate value
print(m1.p(np.average(sum(np.abs(data[0]-data[1]))))) # theoretical(true) value

m1 = psl.pstable(r, dim, metric_dim=2)
m1.lsh(data[0])
m2 = psl.pstable(r, dim, metric_dim=2)
m2.lsh(data[1])
print(m1.md(m2)) # estimate value
print(m1.p(np.sqrt(sum([i**2 for i in data[0]-data[1]])))) # theoretical(true) value

Use case

Define the parameter $r$ and vector dimension for preparing the test data:

import numpy as np
import p_stable_lsh.pstable as psl
r = 50.0 # the parameter $r$ in paper
dim = 100 # vector dimension
data = [np.random.random(dim) for _ in range(3)] # generate two vectors

Instantiate pstable object with specific dimension space. ($L_1$ for example)

m1 = psl.pstable(r, dim, metric_dim=1)
m2 = psl.pstable(r, dim, metric_dim=1)

Hash vectors with p-stable LSH function.

m1.lsh(data[0])
m2.lsh(data[1])

Estimate distance between two object.

m1.md(m2) # estimate value

Show the ground truth distance probability using integration shown in paper.

l1_distance = np.average(sum(np.abs(data[0]-data[1])))
m1.p(l1_distance)

Another way to instantiate pstable object with hash values.

m3 = psl.pstable(r, dim, metric_dim=1, hashvalues=m1.hashvalues)
m3.md(m2)

Update object hash values with different vector.

m2.lsh(data[2])
m3.md(m2)

Parameters

The parameter $r$ in paper determines the relationship between the distance (in metric space) between vectors and the probability of collision (the .md method). The following shows different probability results varied by $r$.

L1 metric space

Data points code

import p_stable_lsh.pstable as psl
r_list = [10, 30, 50, 100, 300, ]
result = []
for r in r_list:
    tmp = []
    m = psl.pstable(r, dim, metric_dim=1)
    for i in range(1, 800):
        tmp.append(m.p(i))
    result.append(tmp)

benchmark in L1 distance

L2 metric space

benchmark in L2 distance

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

p-stable-lsh-python-0.0.3.tar.gz (4.9 kB view details)

Uploaded Source

Built Distribution

p_stable_lsh_python-0.0.3-py3-none-any.whl (5.1 kB view details)

Uploaded Python 3

File details

Details for the file p-stable-lsh-python-0.0.3.tar.gz.

File metadata

  • Download URL: p-stable-lsh-python-0.0.3.tar.gz
  • Upload date:
  • Size: 4.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for p-stable-lsh-python-0.0.3.tar.gz
Algorithm Hash digest
SHA256 82b405bd24094332056f1a8d045493c229c30eddfd6201e5b0270ac6e7b22ecf
MD5 339a37530a19bd199bc6687dbb708f30
BLAKE2b-256 c640b88d5cc2ae1d1a782e9a13c79d3c3a6441c0e1f8b9ab9308f0923f15dfb7

See more details on using hashes here.

File details

Details for the file p_stable_lsh_python-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: p_stable_lsh_python-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 5.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for p_stable_lsh_python-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 39a149faaceedc610959120445c87d95ed36b3a17db0e0f9e917d1c92d8521ab
MD5 2c9a177361d6f60d4db8a7b329ad4b82
BLAKE2b-256 07b80657461225d1e676f6afc0a84b2963e4fc931873d947a3079e067bcc8d00

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page