Skip to main content

A Python implementation of locality sensitive hashing.

Project description

pyLSHash

PyPI Python package codecov License Python Platform stars

A fast Python implementation of locality sensitive hashing.

I was using kayzhu/LSHash, but it stopped updating since 2013.
So I maintain it myself, and I have made a lot of improvement based on it.

Highlights

  • Fast hash calculation for large amount of high dimensional data through the use of numpy arrays.
  • Built-in support for persistency through Redis.
  • Multiple hash indexes support.
  • Built-in support for common distance/objective functions for ranking outputs.

Installation

pyLSHash depends on the following libraries:

  • numpy
  • redis (if persistency through Redis is needed)

To install:

$ pip install pyLSHash

Quickstart

To create 6-bit hashes for input data of 8 dimensions:

from pyLSHash import LSHash

lsh = LSHash(hash_size=6, input_dim=8)
lsh.index([1, 2, 3, 4, 5, 6, 7, 8])
lsh.index([2, 3, 4, 5, 6, 7, 8, 9])
# attach extra_data
lsh.index([2, 3, 4, 5, 6, 7, 8, 9], extra_data="some vector info")
lsh.index([10, 12, 99, 1, 5, 31, 2, 3])

res = lsh.query([1, 2, 3, 4, 5, 6, 7, 7])

[((1, 2, 3, 4, 5, 6, 7, 8), 1.0), ((2, 3, 4, 5, 6, 7, 8, 9), 11)]

User defined distance function

def l1norm_dist(x, y):
    return sum(abs(x - y))


res2 = lsh.query([1, 2, 3, 4, 5, 6, 7, 7], dist_func=l1norm_dist)

print(res2)

Use Redis

from pyLSHash import LSHash

lsh = LSHash(hash_size=6, input_dim=8
             , storage_instance=RedisStorage({'host': 'localhost', 'port': 6379, 'decode_responses': True}))

lsh.index([1, 2, 3, 4, 5, 6, 7, 8])
lsh.index([2, 3, 4, 5, 6, 7, 8, 9])
# attach extra_data
lsh.index([2, 3, 4, 5, 6, 7, 8, 9], extra_data="some vector info")
lsh.index([10, 12, 99, 1, 5, 31, 2, 3])
lsh.index([10, 12, 99, 1, 5, 31, 2, 3])

res = lsh.query([1, 2, 3, 4, 5, 6, 7, 7])

Use other database as storage

from pyLSHash import LSHash
from pyLSHash.storage import StorageBase
import redis
import json


class MyStorage(StorageBase):
    def __init__(self):
        self.storage = redis.StrictRedis(host='localhost', port=6379, decode_responses=True)

    def keys(self, pattern="*"):
        return self.storage.keys(pattern)

    def set_val(self, key, val):
        self.storage.set(key, val)

    def get_val(self, key):
        return self.storage.get(key)

    def append_val(self, key, val):
        self.storage.rpush(key, json.dumps(val))

    def get_list(self, key):
        res_list = [json.loads(val) for val in self.storage.lrange(key, 0, -1)]
        return tuple((tuple(item[0]), item[1]) for item in res_list)

    def clear(self):
        for key in self.storage.keys():
            self.storage.delete(key)


lsh = LSHash(hash_size=6, input_dim=8
             , storage_instance=MyStorage())

lsh.index([1, 2, 3, 4, 5, 6, 7, 8])
lsh.index([2, 3, 4, 5, 6, 7, 8, 9])
lsh.index([2, 3, 4, 5, 6, 7, 8, 9], extra_data="some vector info")
lsh.index([10, 12, 99, 1, 5, 31, 2, 3])
lsh.index([10, 12, 99, 1, 5, 31, 2, 3])

res = lsh.query([1, 2, 3, 4, 5, 6, 7, 7])

save&load model

lsh.save_uniform_planes("filename.pkl")
lsh.load_uniform_planes("filename.pkl")

clear indexed data

lsh.clear_storage()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyLSHash-0.1.1.tar.gz (5.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyLSHash-0.1.1-py3-none-any.whl (6.8 kB view details)

Uploaded Python 3

File details

Details for the file pyLSHash-0.1.1.tar.gz.

File metadata

  • Download URL: pyLSHash-0.1.1.tar.gz
  • Upload date:
  • Size: 5.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.6.1 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.8.5

File hashes

Hashes for pyLSHash-0.1.1.tar.gz
Algorithm Hash digest
SHA256 727f981209f7834cdc7cbf94b8429b7c314d838145e24272982d7e227871e7e4
MD5 f36c3ca1fe990df9a6929ff8cf0a89c0
BLAKE2b-256 8e3e50437dc80b83ead54a7ccc339918fce29a13d14645a7790bb022a0ff305c

See more details on using hashes here.

File details

Details for the file pyLSHash-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: pyLSHash-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 6.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.6.1 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.8.5

File hashes

Hashes for pyLSHash-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 14059beb372c84d103d23a8c5c1001e82d70a0feddcadea610936af3964336db
MD5 014f75aa1ec350828fe4f7db8e7ab20c
BLAKE2b-256 ffeb6b0adf04d7978ac33904b9f7c84c421613e26d56771c6295ddea543ac78f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page