A 1&2-stable Locality-Sensitive Hashing implementation in Python
Project description
P-stable-LSH
The package is one implementation of paper Locality-Sensitive Hashing Scheme Based on p-Stable Distributions in SCG’2014. P-stable-lsh a novel Locality-Sensitive Hashing scheme for the Approximate Nearest Neighbor Problem under $L_p$ norm, based on p-stable distributions.
Note: This code is used as the practice of the paper, and there are few optimizations. Sharing is for communication and learning. If it is a high-performance scenario, please optimize as appropriate.
Installation
pip install p-stable-lsh-python
Usage
Example
The following example shows all features of the package, If you want to know the details, please refer to the source code and comments.
import numpy as np
import p_stable_lsh.pstable as psl
dim = 100 # vector dimension
data = [np.random.random(dim) for _ in range(2)] # generate two vectors
r = 50.0 # the parameter $r$ in paper
m1 = psl.pstable(r, dim, metric_dim=1)
m1.lsh(data[0])
m2 = psl.pstable(r, dim, metric_dim=1)
m2.lsh(data[1])
print(m1.md(m2)) # estimate value
print(m1.p(np.average(sum(np.abs(data[0]-data[1]))))) # theoretical(true) value
m1 = psl.pstable(r, dim, metric_dim=2)
m1.lsh(data[0])
m2 = psl.pstable(r, dim, metric_dim=2)
m2.lsh(data[1])
print(m1.md(m2)) # estimate value
print(m1.p(np.sqrt(sum([i**2 for i in data[0]-data[1]])))) # theoretical(true) value
Use case
Define the parameter $r$ and vector dimension for preparing the test data:
import numpy as np
import p_stable_lsh.pstable as psl
r = 50.0 # the parameter $r$ in paper
dim = 100 # vector dimension
data = [np.random.random(dim) for _ in range(3)] # generate two vectors
Instantiate pstable
object with specific dimension space. ($L_1$ for example)
m1 = psl.pstable(r, dim, metric_dim=1)
m2 = psl.pstable(r, dim, metric_dim=1)
Hash vectors with p-stable LSH function.
m1.lsh(data[0])
m2.lsh(data[1])
Estimate distance between two object.
m1.md(m2) # estimate value
Show the ground truth distance probability using integration shown in paper.
l1_distance = np.average(sum(np.abs(data[0]-data[1])))
m1.p(l1_distance)
Another way to instantiate pstable
object with hash values.
m3 = psl.pstable(r, dim, metric_dim=1, hashvalues=m1.hashvalues)
m3.md(m2)
Update object hash values with different vector.
m2.lsh(data[2])
m3.md(m2)
Parameters
The parameter $r$ in paper determines the relationship between the distance (in metric space) between vectors and the probability of collision (the .md
method). The following shows different probability results varied by $r$.
L1 metric space
Data points code
import p_stable_lsh.pstable as psl
r_list = [10, 30, 50, 100, 300, ]
result = []
for r in r_list:
tmp = []
m = psl.pstable(r, dim, metric_dim=1)
for i in range(1, 800):
tmp.append(m.p(i))
result.append(tmp)
L2 metric space
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file p-stable-lsh-python-0.0.3.tar.gz
.
File metadata
- Download URL: p-stable-lsh-python-0.0.3.tar.gz
- Upload date:
- Size: 4.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 82b405bd24094332056f1a8d045493c229c30eddfd6201e5b0270ac6e7b22ecf |
|
MD5 | 339a37530a19bd199bc6687dbb708f30 |
|
BLAKE2b-256 | c640b88d5cc2ae1d1a782e9a13c79d3c3a6441c0e1f8b9ab9308f0923f15dfb7 |
File details
Details for the file p_stable_lsh_python-0.0.3-py3-none-any.whl
.
File metadata
- Download URL: p_stable_lsh_python-0.0.3-py3-none-any.whl
- Upload date:
- Size: 5.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 39a149faaceedc610959120445c87d95ed36b3a17db0e0f9e917d1c92d8521ab |
|
MD5 | 2c9a177361d6f60d4db8a7b329ad4b82 |
|
BLAKE2b-256 | 07b80657461225d1e676f6afc0a84b2963e4fc931873d947a3079e067bcc8d00 |