Skip to main content

A library for performing shingling and LSH for python.

Project description

lsh

LSH is a Python implementation of locality sensitive hashing with minhash. It is very useful for detecting near duplicate documents.

The implementation uses the MurmurHash v3 library to create document finger prints.

Cython is needed if you want to regenerate the .cpp files for the hashing and shingling code. By default the setup script uses the pregenerated .cpp sources, you can change this with the USE_CYTHON flag in setup.py

NumPy is needed to run the code.

The MurmurHash3 library is distributed under the MIT license. More information https://github.com/aappleby/smhasher

installation

> git clone https://github.com/mattilyra/LSH
> cd LSH
> python setup.py install

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hlwy-lsh-0.3.0.tar.gz (126.3 kB view hashes)

Uploaded Source

Built Distribution

hlwy_lsh-0.3.0-cp37-cp37m-macosx_10_14_x86_64.whl (79.8 kB view hashes)

Uploaded CPython 3.7m macOS 10.14+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page