Skip to main content

Fast Hamming distance calculation for hexadecimal strings

Project description

Pip Prs Github

What does it do?

This module performs a fast bitwise hamming distance of two hexadecimal strings.

This looks like:

DEADBEEF = 11011110101011011011111011101111
00000000 = 00000000000000000000000000000000
XOR      = 11011110101011011011111011101111
Hamming  = number of ones in DEADBEEF ^ 00000000 = 24

This essentially amounts to

>>> import gmpy
>>> gmpy.popcount(0xdeadbeef ^ 0x00000000)
24

except with Python strings, so

>>> import gmpy
>>> gmpy.popcount(int("deadbeef", 16) ^ int("00000000", 16))
24

A few assumptions are made and enforced:

  • this is a valid hexadecimal string (i.e., [a-fA-F0-9]+)

  • the strings are the same length

  • the strings do not begin with "0x"

Why yet another Hamming distance library?

There are a lot of fantastic (python) libraries that offer methods to calculate various edit distances, including Hamming distances: Distance, textdistance, scipy, jellyfish, etc.

In this case, I needed a hamming distance library that worked on hexadecimal strings (i.e., a Python str) and performed blazingly fast. Furthermore, I often did not care about hex strings greater than 256 bits. That length constraint is different vs all the other libraries and enabled me to explore vectorization techniques via numba, numpy, and SSE/AVX intrinsics.

Lastly, I wanted to minimize dependencies, meaning you do not need to install numpy, gmpy, cython, pypy, pythran, etc.

Eventually, after playing around with gmpy.popcount, numba.jit, pythran.run, numpy, I decided to write what I wanted in essentially raw C. At this point, I’m using raw char* and int*, so exploring re-writing this in Fortran makes little sense.

Installation

To install, ensure you have Python 3.6+. Run

pip install hexhamming

or to install from source

git clone https://github.com/mrecachinas/hexhamming
cd hexhamming
python setup.py install # or pip install .

If you want to contribute to hexhamming, you should install the dev dependencies

pip install -r requirements-dev.txt

and make sure the tests pass with

python -m pytest -vls .

Example

Using hexhamming is as simple as

>>> from hexhamming import hamming_distance_string
>>> hamming_distance_string("deadbeef", "00000000")
24

New in v2.0.0 : hexhamming now supports byte`s via ``hamming_distance_bytes`. You use it in the exact same way as before, except you pass in a byte string.

>>> from hexhamming import hamming_distance_bytes
>>> hamming_distance_bytes(b"\xde\xad\xbe\xef", b"\x00\x00\x00\x00")
24

Benchmark

Below is a benchmark using pytest-benchmark with hexhamming==v1.3.2 my 2020 2.0 GHz quad-core Intel Core i5 16 GB 3733 MHz LPDDR4 macOS Catalina (10.15.5) with Python 3.7.3 and Apple clang version 11.0.3 (clang-1103.0.32.62).

Name

Mean (ns)

Std (ns)

Median (ns)

Rounds

Iterations

test_hamming_distance_bench_3

93.8

10.5

94.3

53268

200

test_hamming_distance_bench_3_same

94.2

15.2

94.9

102146

100

test_check_hexstrings_within_dist_bench

231.9

104.2

216.5

195122

22

test_hamming_distance_bench_256

97.5

34.1

94.0

195122

22

test_hamming_distance_bench_1000

489.8

159.4

477.5

94411

20

test_hamming_distance_bench_1000_same

497.8

87.8

496.6

18971

20

test_hamming_distance_bench_1024

509.9

299.5

506.7

18652

10

test_hamming_distance_bench_1024_same

467.4

205.9

450.4

181819

10

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hexhamming-2.2.3.tar.gz (15.7 kB view hashes)

Uploaded source

Built Distributions

hexhamming-2.2.3-cp310-cp310-win_amd64.whl (13.2 kB view hashes)

Uploaded cp310

hexhamming-2.2.3-cp39-cp39-win_amd64.whl (13.2 kB view hashes)

Uploaded cp39

hexhamming-2.2.3-cp38-cp38-win_amd64.whl (13.2 kB view hashes)

Uploaded cp38

hexhamming-2.2.3-cp37-cp37m-win_amd64.whl (13.2 kB view hashes)

Uploaded cp37

hexhamming-2.2.3-cp36-cp36m-win_amd64.whl (13.2 kB view hashes)

Uploaded cp36

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page