Skip to main content

A library for performing shingling and LSH for python.

Project description

lsh

LSH is a Python implementation of locality sensitive hashing with minhash. It is very useful for detecting near duplicate documents.

The implementation uses the MurmurHash v3 library to create document finger prints.

Cython is needed if you want to regenerate the .cpp files for the hashing and shingling code. By default the setup script uses the pregenerated .cpp sources, you can change this with the USE_CYTHON flag in setup.py

NumPy is needed to run the code.

The MurmurHash3 library is distributed under the MIT license. More information https://github.com/aappleby/smhasher

installation

> git clone https://github.com/mattilyra/LSH
> cd LSH
> python setup.py install

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hlwy-lsh-0.3.0.tar.gz (126.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hlwy_lsh-0.3.0-cp37-cp37m-macosx_10_14_x86_64.whl (79.8 kB view details)

Uploaded CPython 3.7mmacOS 10.14+ x86-64

File details

Details for the file hlwy-lsh-0.3.0.tar.gz.

File metadata

  • Download URL: hlwy-lsh-0.3.0.tar.gz
  • Upload date:
  • Size: 126.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.14.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.35.0 CPython/3.7.4

File hashes

Hashes for hlwy-lsh-0.3.0.tar.gz
Algorithm Hash digest
SHA256 c058e5d09ca2a5acdf39fc09ab8688ebeee57ff94db1bacd1f8e0840e67a71ff
MD5 f0d55602bc3abe41aa63c5d4067644a9
BLAKE2b-256 fba549e4a661479eeffefb83867afa2211f863106ff510c5baa9793281d2756a

See more details on using hashes here.

File details

Details for the file hlwy_lsh-0.3.0-cp37-cp37m-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: hlwy_lsh-0.3.0-cp37-cp37m-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 79.8 kB
  • Tags: CPython 3.7m, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.14.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.35.0 CPython/3.7.4

File hashes

Hashes for hlwy_lsh-0.3.0-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 845cdd8573b28edd077bb027fe19a4abd10b8fee9c6e29e1f07e254a3b90c1e9
MD5 77613cf34a81e368c4a5785bb019df2a
BLAKE2b-256 5e26e2a4776f8d57b523105ddc09a6146bb0cba309cbf1061d66f7dda155162c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page