a simhash module in cpp for python
Project description
simhash
simhash cpp module for python, a cpp implement of simhash, support for large dimesion such as 128bit
install
pip install pysimhash
or install from github.com
git clone https://github.com/skiloop/simhash
cd simhash
python setup.py install
requirements
- boost-python
how to use
example:
import pysimhash
import hashlib
document = "google.com hybridtheory.com youtube.com reddit.com"
tokens = [hashlib.md5(s.encode('utf-8')).hexdigest() for s in document.split(" ")]
s2 = pysimhash.SimHash(128, 16) # f=128, hash_bit=16
s2.build(tokens, base=16)
print(s2.hex())
benchmark
With 10000 creating and 100,000 comparing(using benchmark.py) on the same linux, results go as follow
implement | build time | comparison time |
---|---|---|
pure python | 1.73s | 222.99s |
pysimhash | 0.14s | 49.89s |
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pysimhash-1.1.1.tar.gz
(8.5 kB
view hashes)