a simhash module in cpp for python
Project description
simhash
simhash cpp module for python, a cpp implement of simhash, support for large dimesion such as 128bit
install
pip install pysimhash
or install from github.com
git clone https://github.com/skiloop/simhash
cd simhash
python setup.py install
requirements
- boost-python
how to use
example:
import pysimhash
import hashlib
document = "google.com hybridtheory.com youtube.com reddit.com"
tokens = [hashlib.md5(s.encode('utf-8')).hexdigest() for s in document.split(" ")]
s2 = pysimhash.SimHash(128, 16) # f=128, hash_bit=16
s2.build(tokens, base=16)
print(s2.hex())
benchmark
With 10000 creating and 100,000 comparing(using benchmark.py) on the same linux, results go as follow
implement | build time | comparison time |
---|---|---|
pure python | 1.73s | 222.99s |
pysimhash | 0.14s | 49.89s |
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pysimhash-1.1.1.tar.gz
(8.5 kB
view details)
File details
Details for the file pysimhash-1.1.1.tar.gz
.
File metadata
- Download URL: pysimhash-1.1.1.tar.gz
- Upload date:
- Size: 8.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.7.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b05ea5e2b41fd368a1c12d7d505f17cd828d73f340570e01b4348ea2c18a315d |
|
MD5 | 0ac51bcd79749be9e1310e2ca81cc8d1 |
|
BLAKE2b-256 | 37588dbdd3eb93f3d33a0f2f3d6e62412f3e26c2bbc66c34fa8ddd6db225fd37 |