Fast Hamming distance calculation for hexadecimal strings
What does it do?
This module performs a fast bitwise hamming distance of two hexadecimal strings.
This looks like:
DEADBEEF = 11011110101011011011111011101111 00000000 = 00000000000000000000000000000000 XOR = 11011110101011011011111011101111 Hamming = number of ones in DEADBEEF ^ 00000000 = 24
This essentially amounts to
>>> import gmpy >>> gmpy.popcount(0xdeadbeef ^ 0x00000000) 24
except with Python strings, so
>>> import gmpy >>> gmpy.popcount(int("deadbeef", 16) ^ int("00000000", 16)) 24
A few assumptions are made and enforced:
- this is a valid hexadecimal string (i.e., [a-fA-F0-9]+)
- the strings are the same length
- the strings do not begin with "0x"
Why yet another Hamming distance library?
There are a lot of fantastic (python) libraries that offer methods to calculate various edit distances, including Hamming distances: Distance, textdistance, scipy, jellyfish, etc.
In this case, I needed a hamming distance library that worked on hexadecimal strings (i.e., a Python str) and performed blazingly fast. Furthermore, I often did not care about hex strings greater than 256 bits. That length constraint is different vs all the other libraries and enabled me to explore vectorization techniques via numba, numpy, and SSE/AVX intrinsics.
Lastly, I wanted to minimize dependencies, meaning you do not need to install numpy, gmpy, cython, pypy, pythran, etc.
Eventually, after playing around with gmpy.popcount, numba.jit, pythran.run, numpy, I decided to write what I wanted in essentially raw C. At this point, I’m using raw char* and int*, so exploring re-writing this in Fortran makes little sense.
To install, ensure you have Python 2.7 or 3.4+. Run
pip install hexhamming
or to install from source
git clone https://github.com/mrecachinas/hexhamming cd hexhamming python setup.py install # or pip install .
If you want to contribute to hexhamming, you should install the dev dependencies
pip install -r requirements-dev.txt
and make sure the tests pass with
Using hexhamming is as simple as
>>> from hexhamming import hamming_distance >>> hamming_distance("deadbeef", "00000000") 24
Below is a benchmark using pytest-benchmark with hexhamming==v1.3.2 my 2020 2.0 GHz quad-core Intel Core i5 16 GB 3733 MHz LPDDR4 macOS Catalina (10.15.5) with Python 3.7.3 and Apple clang version 11.0.3 (clang-1188.8.131.52).
|Name||Mean (ns)||Std (ns)||Median (ns)||Rounds||Iterations|
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
|Filename, size||File type||Python version||Upload date||Hashes|
|Filename, size hexhamming-1.4.0-cp27-cp27m-macosx_10_15_x86_64.whl (6.2 kB)||File type Wheel||Python version cp27||Upload date||Hashes View|
|Filename, size hexhamming-1.4.0-cp37-cp37m-macosx_10_14_x86_64.whl (6.3 kB)||File type Wheel||Python version cp37||Upload date||Hashes View|
|Filename, size hexhamming-1.4.0.tar.gz (7.0 kB)||File type Source||Python version None||Upload date||Hashes View|
Hashes for hexhamming-1.4.0-cp27-cp27m-macosx_10_15_x86_64.whl
Hashes for hexhamming-1.4.0-cp37-cp37m-macosx_10_14_x86_64.whl