Skip to main content

A library for performing shingling and LSH for python.

Project description


LSH is a Python implementation of locality sensitive hashing with minhash. It is very useful for detecting near duplicate documents.

The implementation uses the MurmurHash v3 library to create document finger prints.

Cython is needed if you want to regenerate the .cpp files for the hashing and shingling code. By default the setup script uses the pregenerated .cpp sources, you can change this with the USE_CYTHON flag in

NumPy is needed to run the code.

The MurmurHash3 library is distributed under the MIT license. More information


$ pip install hlwy-lsh


Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for hlwy-lsh, version 0.3.6
Filename, size File type Python version Upload date Hashes
Filename, size hlwy-lsh-0.3.6.tar.gz (126.4 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page