Skip to main content

a simhash module in cpp for python

Project description


simhash cpp module for python, a cpp implement of simhash, support for large dimesion such as 128bit


pip install pysimhash

or install from

git clone
cd simhash
python install


  • boost-python

how to use


import pysimhash
import hashlib
document = ""
tokens = [hashlib.md5(s.encode('utf-8')).hexdigest() for s in document.split(" ")]
s2 = pysimhash.SimHash(128, 16) # f=128, hash_bit=16, base=16)


With 10000 creating and 100,000 comparing(using on the same linux, results go as follow

implement build time comparison time
pure python 1.73s 222.99s
pysimhash 0.14s 49.89s

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pysimhash-1.1.1.tar.gz (8.5 kB view hashes)

Uploaded source

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page