A Python implementation of locality sensitive hashing.
Project description
pyLSHash
A fast Python implementation of locality sensitive hashing.
I was using kayzhu/LSHash, but it stopped updating since 2013.
So I maintain it myself, and I have made a lot of improvement based on it.
Highlights
- Fast hash calculation for large amount of high dimensional data through the use of
numpyarrays. - Built-in support for persistency through Redis.
- Multiple hash indexes support.
- Built-in support for common distance/objective functions for ranking outputs.
Installation
pyLSHash depends on the following libraries:
- numpy
- redis (if persistency through Redis is needed)
To install:
$ pip install pyLSHash
Quickstart
To create 6-bit hashes for input data of 8 dimensions:
from pyLSHash import LSHash
lsh = LSHash(hash_size=6, input_dim=8)
lsh.index([1, 2, 3, 4, 5, 6, 7, 8])
lsh.index([2, 3, 4, 5, 6, 7, 8, 9])
# attach extra_data
lsh.index([2, 3, 4, 5, 6, 7, 8, 9], extra_data="some vector info")
lsh.index([10, 12, 99, 1, 5, 31, 2, 3])
res = lsh.query([1, 2, 3, 4, 5, 6, 7, 7])
[((1, 2, 3, 4, 5, 6, 7, 8), 1.0), ((2, 3, 4, 5, 6, 7, 8, 9), 11)]
User defined distance function
def l1norm_dist(x, y):
return sum(abs(x - y))
res2 = lsh.query([1, 2, 3, 4, 5, 6, 7, 7], dist_func=l1norm_dist)
print(res2)
Use Redis
from pyLSHash import LSHash
lsh = LSHash(hash_size=6, input_dim=8
, storage_instance=RedisStorage({'host': 'localhost', 'port': 6379, 'decode_responses': True}))
lsh.index([1, 2, 3, 4, 5, 6, 7, 8])
lsh.index([2, 3, 4, 5, 6, 7, 8, 9])
# attach extra_data
lsh.index([2, 3, 4, 5, 6, 7, 8, 9], extra_data="some vector info")
lsh.index([10, 12, 99, 1, 5, 31, 2, 3])
lsh.index([10, 12, 99, 1, 5, 31, 2, 3])
res = lsh.query([1, 2, 3, 4, 5, 6, 7, 7])
Use other database as storage
from pyLSHash import LSHash
from pyLSHash.storage import StorageBase
import redis
import json
class MyStorage(StorageBase):
def __init__(self):
self.storage = redis.StrictRedis(host='localhost', port=6379, decode_responses=True)
def keys(self, pattern="*"):
return self.storage.keys(pattern)
def set_val(self, key, val):
self.storage.set(key, val)
def get_val(self, key):
return self.storage.get(key)
def append_val(self, key, val):
self.storage.rpush(key, json.dumps(val))
def get_list(self, key):
res_list = [json.loads(val) for val in self.storage.lrange(key, 0, -1)]
return tuple((tuple(item[0]), item[1]) for item in res_list)
def clear(self):
for key in self.storage.keys():
self.storage.delete(key)
lsh = LSHash(hash_size=6, input_dim=8
, storage_instance=MyStorage())
lsh.index([1, 2, 3, 4, 5, 6, 7, 8])
lsh.index([2, 3, 4, 5, 6, 7, 8, 9])
lsh.index([2, 3, 4, 5, 6, 7, 8, 9], extra_data="some vector info")
lsh.index([10, 12, 99, 1, 5, 31, 2, 3])
lsh.index([10, 12, 99, 1, 5, 31, 2, 3])
res = lsh.query([1, 2, 3, 4, 5, 6, 7, 7])
save&load model
lsh.save_uniform_planes("filename.pkl")
lsh.load_uniform_planes("filename.pkl")
clear indexed data
lsh.clear_storage()
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pyLSHash-0.1.1.tar.gz.
File metadata
- Download URL: pyLSHash-0.1.1.tar.gz
- Upload date:
- Size: 5.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.6.1 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.8.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
727f981209f7834cdc7cbf94b8429b7c314d838145e24272982d7e227871e7e4
|
|
| MD5 |
f36c3ca1fe990df9a6929ff8cf0a89c0
|
|
| BLAKE2b-256 |
8e3e50437dc80b83ead54a7ccc339918fce29a13d14645a7790bb022a0ff305c
|
File details
Details for the file pyLSHash-0.1.1-py3-none-any.whl.
File metadata
- Download URL: pyLSHash-0.1.1-py3-none-any.whl
- Upload date:
- Size: 6.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.6.1 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.8.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
14059beb372c84d103d23a8c5c1001e82d70a0feddcadea610936af3964336db
|
|
| MD5 |
014f75aa1ec350828fe4f7db8e7ab20c
|
|
| BLAKE2b-256 |
ffeb6b0adf04d7978ac33904b9f7c84c421613e26d56771c6295ddea543ac78f
|