Skip to main content

A Python implementation of locality sensitive hashing.

Project description

LSHash

A fast Python implementation of locality sensitive hashing.

I am using https://github.com/kayzhu/LSHash, but it stops to update since 2013.
So I maintain it myself.

Highlights

  • Fast hash calculation for large amount of high dimensional data through the use of numpy arrays.
  • Built-in support for persistency through Redis.
  • Multiple hash indexes support.
  • Built-in support for common distance/objective functions for ranking outputs.

Installation

LSHash depends on the following libraries:

  • numpy
  • redis (if persistency through Redis is needed)
  • bitarray (if hamming distance is used as distance function)

To install:

$ pip install lshash

Quickstart

To create 6-bit hashes for input data of 8 dimensions:

from py_lsh import LSHash

lsh = LSHash(6, 8)
lsh.index([1, 2, 3, 4, 5, 6, 7, 8])
lsh.index([2, 3, 4, 5, 6, 7, 8, 9])
lsh.index([10, 12, 99, 1, 5, 31, 2, 3])
lsh.query([1, 2, 3, 4, 5, 6, 7, 7])

[((1, 2, 3, 4, 5, 6, 7, 8), 1.0), ((2, 3, 4, 5, 6, 7, 8, 9), 11)]

Main Interface

  • To initialize a LSHash instance:
LSHash(hash_size, input_dim, num_of_hashtables=1, storage=None)

parameters:

  • hash_size: The length of the resulting binary hash.
  • input_dim: The dimension of the input vector.
  • num_hashtables = 1: (optional) The number of hash tables used for multiple lookups.
  • storage = None: (optional) Specify the name of the storage to be used for the index storage. Options include "redis".

To index a data point of a given LSHash instance, e.g., lsh:

lsh.index(input_point, extra_data=None)

parameters:

  • input_point: The input data point is an array or tuple of numbers of input_dim.
  • extra_data = None: (optional) Extra data to be added along with the input_point.

To query a data point against a given LSHash instance, e.g., lsh:

lsh.query(query_point, num_results=None, distance_func="euclidean")

parameters:

  • query_point: The query data point is an array or tuple of numbers of input_dim.
  • num_results = None: (optional) The number of query results to return in ranked order. By default all results will be returned.
  • distance_func = "euclidean": (optional) Distance function to use to rank the candidates. By default euclidean distance function will be used.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

py_lsh-0.0.2.tar.gz (5.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

py_lsh-0.0.2-py3-none-any.whl (6.9 kB view details)

Uploaded Python 3

File details

Details for the file py_lsh-0.0.2.tar.gz.

File metadata

  • Download URL: py_lsh-0.0.2.tar.gz
  • Upload date:
  • Size: 5.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.6.1 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.8.5

File hashes

Hashes for py_lsh-0.0.2.tar.gz
Algorithm Hash digest
SHA256 9f9d1a8efe9bff2fb527e2bb5fe4f791c1cff29124472a3fb4b5eb99ea1c5e19
MD5 d718bfe51328f1e2302aec12ed79f249
BLAKE2b-256 e710d0d2f115768a890010f2d86d6f62b2f959484e82f3e2b060ba5b901d7dc2

See more details on using hashes here.

File details

Details for the file py_lsh-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: py_lsh-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 6.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.6.1 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.8.5

File hashes

Hashes for py_lsh-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 42b4ad2328e05bd14d2fdf27183e41d4d085e36b4ad4a912c2ad0348787262ff
MD5 b8710fd05f996c6107ae4a9278dddbe7
BLAKE2b-256 1829fa6e603f720c75692e9f6dba4c972d1b1cfca79ab7bef54fae006fa8f671

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page