Skip to main content

A library for performing shingling and LSH for python.

Project description

hlwy-lsh

LSH is a Python implementation of locality sensitive hashing with minhash. It is very useful for detecting near duplicate documents.

The implementation uses the MurmurHash v3 library to create document finger prints.

Cython is needed if you want to regenerate the .cpp files for the hashing and shingling code. By default the setup script uses the pregenerated .cpp sources, you can change this with the USE_CYTHON flag in setup.py

NumPy is needed to run the code.

The MurmurHash3 library is distributed under the MIT license. More information https://github.com/aappleby/smhasher

Installation

$ pip install hlwy-lsh
…

✨🍰✨

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hlwy-lsh-0.3.3.tar.gz (126.3 kB view details)

Uploaded Source

File details

Details for the file hlwy-lsh-0.3.3.tar.gz.

File metadata

  • Download URL: hlwy-lsh-0.3.3.tar.gz
  • Upload date:
  • Size: 126.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.14.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.35.0 CPython/3.7.4

File hashes

Hashes for hlwy-lsh-0.3.3.tar.gz
Algorithm Hash digest
SHA256 eb23df7c631b115032b95f46cad1c5e6258c62c2af80fc33be7f5f8495668ed1
MD5 6a05214daa9bd502192e5ad62485ad61
BLAKE2b-256 c4c7157bac38c83140a29fe9f350e287cb8a03a2e9b097b81ae7976868c16155

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page