A library for performing shingling and LSH for python.
Project description
lsh
LSH is a Python implementation of locality sensitive hashing with minhash. It is very useful for detecting near duplicate documents.
The implementation uses the MurmurHash v3 library to create document finger prints.
Cython is needed if you want to regenerate the .cpp files for the hashing and shingling code. By default the setup script uses the pregenerated .cpp sources, you can change this with the USE_CYTHON flag in setup.py
NumPy is needed to run the code.
The MurmurHash3 library is distributed under the MIT license. More information https://github.com/aappleby/smhasher
installation
> git clone https://github.com/mattilyra/LSH
> cd LSH
> python setup.py install
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hlwy-lsh-0.3.0.tar.gz.
File metadata
- Download URL: hlwy-lsh-0.3.0.tar.gz
- Upload date:
- Size: 126.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.14.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.35.0 CPython/3.7.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c058e5d09ca2a5acdf39fc09ab8688ebeee57ff94db1bacd1f8e0840e67a71ff
|
|
| MD5 |
f0d55602bc3abe41aa63c5d4067644a9
|
|
| BLAKE2b-256 |
fba549e4a661479eeffefb83867afa2211f863106ff510c5baa9793281d2756a
|
File details
Details for the file hlwy_lsh-0.3.0-cp37-cp37m-macosx_10_14_x86_64.whl.
File metadata
- Download URL: hlwy_lsh-0.3.0-cp37-cp37m-macosx_10_14_x86_64.whl
- Upload date:
- Size: 79.8 kB
- Tags: CPython 3.7m, macOS 10.14+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.14.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.35.0 CPython/3.7.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
845cdd8573b28edd077bb027fe19a4abd10b8fee9c6e29e1f07e254a3b90c1e9
|
|
| MD5 |
77613cf34a81e368c4a5785bb019df2a
|
|
| BLAKE2b-256 |
5e26e2a4776f8d57b523105ddc09a6146bb0cba309cbf1061d66f7dda155162c
|