Skip to main content

a simhash module in cpp for python

Project description

simhash

https://pypi.python.org/pypi/pysimhash https://pypi.python.org/pypi/pysimhash https://github.com/skiloop/simhash/actions?query=workflow%3ACodeQL

simhash cpp module for python, a cpp implement of simhash, support for large dimesion such as 128bit

install

pip install pysimhash

or install from github.com

git clone https://github.com/skiloop/simhash
cd simhash
python setup.py install

requirements

  • boost-python

how to use

example:

import pysimhash
import hashlib
document = "google.com hybridtheory.com youtube.com reddit.com"
tokens = [hashlib.md5(s.encode('utf-8')).hexdigest() for s in document.split(" ")]
s2 = pysimhash.SimHash(128, 16) # f=128, hash_bit=16
s2.build(tokens, base=16)
print(s2.hex())

benchmark

With 10000 creating and 100,000 comparing(using benchmark.py) on the same linux, results go as follow

implement build time comparison time
pure python 1.73s 222.99s
pysimhash 0.14s 49.89s

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pysimhash-1.1.1.tar.gz (8.5 kB view details)

Uploaded Source

File details

Details for the file pysimhash-1.1.1.tar.gz.

File metadata

  • Download URL: pysimhash-1.1.1.tar.gz
  • Upload date:
  • Size: 8.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.7.0

File hashes

Hashes for pysimhash-1.1.1.tar.gz
Algorithm Hash digest
SHA256 b05ea5e2b41fd368a1c12d7d505f17cd828d73f340570e01b4348ea2c18a315d
MD5 0ac51bcd79749be9e1310e2ca81cc8d1
BLAKE2b-256 37588dbdd3eb93f3d33a0f2f3d6e62412f3e26c2bbc66c34fa8ddd6db225fd37

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page