Skip to main content

a simhash module in cpp for python

Project description


simhash cpp module for python, a cpp implement of simhash, support for large dimesion such as 128bit


pip install pysimhash

or install from

git clone
cd simhash
python install


  • boost-python

how to use


import pysimhash
import hashlib
document = ""
tokens = [hashlib.md5(s.encode('utf-8')).hexdigest() for s in document.split(" ")]
s2 = pysimhash.SimHash(128, 16) # f=128, hash_bit=16, base=16)


With 10000 creating and 100,000 comparing(using on the same linux, results go as follow

implement build time comparison time
pure python 1.73s 222.99s
pysimhash 0.14s 49.89s

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pysimhash-1.1.1.tar.gz (8.5 kB view hashes)

Uploaded source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page