Skip to main content

Fast Hamming distance calculation for hexadecimal strings

Project description

Pip Prs Github

What does it do?

This module performs a fast bitwise hamming distance of two hexadecimal strings.

This looks like:

DEADBEEF = 11011110101011011011111011101111
00000000 = 00000000000000000000000000000000
XOR      = 11011110101011011011111011101111
Hamming  = number of ones in DEADBEEF ^ 00000000 = 24

This essentially amounts to

>>> import gmpy
>>> gmpy.popcount(0xdeadbeef ^ 0x00000000)
24

except with Python strings, so

>>> import gmpy
>>> gmpy.popcount(int("deadbeef", 16) ^ int("00000000", 16))
24

A few assumptions are made and enforced:

  • this is a valid hexadecimal string (i.e., [a-fA-F0-9]+)

  • the strings are the same length

  • the strings do not begin with "0x"

Why yet another Hamming distance library?

There are a lot of fantastic (python) libraries that offer methods to calculate various edit distances, including Hamming distances: Distance, textdistance, scipy, jellyfish, etc.

In this case, I needed a hamming distance library that worked on hexadecimal strings (i.e., a Python str) and performed blazingly fast. Furthermore, I often did not care about hex strings greater than 256 bits. That length constraint is different vs all the other libraries and enabled me to explore vectorization techniques via SSE/AVX and NEON intrinsics.

Lastly, I wanted to minimize dependencies, meaning you do not need to install numpy, gmpy, cython, pypy, pythran, etc.

As of v3.0.0, hexhamming is written in Rust using PyO3 and maturin, providing memory safety, GIL release during computation, and free-threaded Python support while maintaining the same SIMD-accelerated performance (SSE4.1, AVX2, NEON).

Installation

To install, ensure you have Python 3.10+. Run

pip install hexhamming

or to install from source (requires Rust toolchain)

git clone https://github.com/mrecachinas/hexhamming
cd hexhamming
pip install .

If you want to contribute to hexhamming, you should install the dev dependencies

pip install -r requirements-dev.txt

and make sure the tests pass with

python -m pytest -vls .

Example

Using hexhamming is as simple as

>>> from hexhamming import hamming_distance_string
>>> hamming_distance_string("deadbeef", "00000000")
24

New in v2.0.0 : hexhamming now supports byte``s via ``hamming_distance_bytes. You use it in the exact same way as before, except you pass in a byte string.

>>> from hexhamming import hamming_distance_bytes
>>> hamming_distance_bytes(b"\xde\xad\xbe\xef", b"\x00\x00\x00\x00")
24

We also provide a method for a quick boolean check of whether two hexadecimal strings are within a given Hamming distance.

>>> from hexhamming import check_hexstrings_within_dist
>>> check_hexstrings_within_dist("ffff", "fffe", 2)
True
>>> check_hexstrings_within_dist("ffff", "0000", 2)
False

Similarly, hexhamming supports a quick byte array check via check_bytes_within_dist, which has a similar API as check_hexstrings_within_dist, except it expects a bytes array.

The API described above is targeted at comparing two individual records and calculating their hamming distance quickly. For many applications the goal is to compare a given record to an array of other records and to find out if there are elements in the array that are within a given hamming distance of the search record. To support these application cases hexhamming has a set of array APIs. Given that these operations are often speed critical and require preparing data anyway, they are only available for bytes strings, not for hex strings.

They all have the same signature, they take two bytes arrays and the max_dist to consider. The difference is, that the first bytes string should be a concatenation of a number of records to compare to, i.e. the length needs to be a multiple of the length of the second bytes string.

There are three functions that return different results, depending on what is needed by the application.

check_bytes_arrays_first_within_dist returns the index of the first element that has a hamming distance less than max_dist.

>>> from hexhamming import check_bytes_arrays_first_within_dist
>>> check_bytes_arrays_first_within_dist(b"\xaa\xaa\xbb\xbb\xcc\xcc\xdd\xdd\xee\xee\xff\xff", b"\xff\xff", 4)
1

check_bytes_arrays_best_within_dist returns a tuple with the distance and the index of the element that has the lowest hamming distance less than max_dist, or (-1,-1) if none do.

>>> from hexhamming import check_bytes_arrays_best_within_dist
>>> check_bytes_arrays_best_within_dist(b"\xaa\xaa\xbb\xbb\xcc\xcc\xdd\xdd\xee\xee\xff\xff", b"\xff\xff", 4)
(0, 5)

>>> check_bytes_arrays_best_within_dist(b"\xaa\xaa\xbb\xbb\xcc\xcc\xdd\xdd\xee\xee\xff\xff", b"\xef\xfe", 4)
(2, 4)

check_bytes_arrays_all_within_dist returns a list of tuples with the distance and the index of the element that have a hamming distance less than max_dist, or [] if none do.

>>> from hexhamming import check_bytes_arrays_all_within_dist
>>> check_bytes_arrays_all_within_dist(b"\xaa\xaa\xbb\xbb\xcc\xcc\xdd\xdd\xee\xee\xff\xff", b"\xff\xff", 4)
[(4, 1), (4, 3), (4, 4), (0, 5)]

Tip: When you’re assembling the long array of records to compare against, don’t concatenate the different bytes together. As they’re immutable that is a very slow operation. Use a bytearray instead, and cast it to bytes at the end. See https://www.guyrutenberg.com/2020/04/04/fast-bytes-concatenation-in-python/ for more info and tests.

Benchmark

Below is a benchmark using pytest-benchmark with hexhamming v3.0.0 on Apple M-series (ARM64) with Python 3.14 and rustc 1.85.

String and bytes hamming distance

Name

Mean (ns)

Std (ns)

hamming_distance_string [3 chars, same]

48.8

10.1

hamming_distance_string [3 chars, diff]

48.4

4.4

hamming_distance_string [64 chars, diff]

88.2

16.0

hamming_distance_string [1000 chars, same]

754.7

251.2

hamming_distance_string [1000 chars, diff]

762.3

75.7

hamming_distance_string [1024 chars, same]

775.1

62.8

hamming_distance_string [1024 chars, diff]

785.0

137.1

hamming_distance_bytes [3 bytes, same]

48.5

5.5

hamming_distance_bytes [3 bytes, diff]

49.0

5.2

hamming_distance_bytes [64 bytes, diff]

50.3

8.4

hamming_distance_bytes [1000 bytes, same]

64.9

5.4

hamming_distance_bytes [1000 bytes, diff]

64.9

11.7

hamming_distance_bytes [1024 bytes, same]

63.2

35.3

hamming_distance_bytes [1024 bytes, diff]

69.1

16.0

check_bytes_within_dist [16 bytes]

52.1

8.9

check_bytes_within_dist [64 bytes]

51.2

23.2

check_bytes_within_dist [127 bytes]

53.4

5.3

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hexhamming-3.0.0.tar.gz (57.3 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

hexhamming-3.0.0-pp311-pypy311_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (250.9 kB view details)

Uploaded PyPymanylinux: glibc 2.17+ x86-64

hexhamming-3.0.0-pp311-pypy311_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (237.7 kB view details)

Uploaded PyPymanylinux: glibc 2.17+ ARM64

hexhamming-3.0.0-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (238.1 kB view details)

Uploaded PyPymanylinux: glibc 2.17+ ARM64

hexhamming-3.0.0-cp314-cp314t-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (235.6 kB view details)

Uploaded CPython 3.14tmanylinux: glibc 2.17+ ARM64

hexhamming-3.0.0-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (249.4 kB view details)

Uploaded CPython 3.14manylinux: glibc 2.17+ x86-64

hexhamming-3.0.0-cp314-cp314-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (236.2 kB view details)

Uploaded CPython 3.14manylinux: glibc 2.17+ ARM64

hexhamming-3.0.0-cp314-cp314-macosx_11_0_arm64.whl (221.1 kB view details)

Uploaded CPython 3.14macOS 11.0+ ARM64

hexhamming-3.0.0-cp314-cp314-macosx_10_12_x86_64.whl (238.5 kB view details)

Uploaded CPython 3.14macOS 10.12+ x86-64

hexhamming-3.0.0-cp313-cp313t-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (235.8 kB view details)

Uploaded CPython 3.13tmanylinux: glibc 2.17+ ARM64

hexhamming-3.0.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (249.6 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

hexhamming-3.0.0-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (236.4 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ ARM64

hexhamming-3.0.0-cp312-cp312-win_amd64.whl (138.3 kB view details)

Uploaded CPython 3.12Windows x86-64

hexhamming-3.0.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (249.7 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

hexhamming-3.0.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (236.5 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ ARM64

hexhamming-3.0.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (250.2 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

hexhamming-3.0.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (236.9 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ ARM64

hexhamming-3.0.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (250.4 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

hexhamming-3.0.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (237.0 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ ARM64

File details

Details for the file hexhamming-3.0.0.tar.gz.

File metadata

  • Download URL: hexhamming-3.0.0.tar.gz
  • Upload date:
  • Size: 57.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.12.2

File hashes

Hashes for hexhamming-3.0.0.tar.gz
Algorithm Hash digest
SHA256 c56b16d0e111993c021a53d2bcdb805598539ae96036b81567d814271a3c9e8b
MD5 d583a343c54b4979b74cc889af24e809
BLAKE2b-256 55dab9664c3f4e03dabd209a495f9ed475c8670a7dab72d2de7b9029836fade5

See more details on using hashes here.

File details

Details for the file hexhamming-3.0.0-pp311-pypy311_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for hexhamming-3.0.0-pp311-pypy311_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 20d2d3e78c69d83eae8c88d1f5aa53e21fb3619040811f6e7cc8f08b79b39738
MD5 9b25fcf0cc4f150dcafa7f1678a7c879
BLAKE2b-256 0adc7a4a26f80947fe56f0da02b800e746f46f0959c402d223328d76b911540b

See more details on using hashes here.

File details

Details for the file hexhamming-3.0.0-pp311-pypy311_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for hexhamming-3.0.0-pp311-pypy311_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 83a0b604d1d805fc71cbf326202713da86520ec2e4b77c48181aed4537a3c4ab
MD5 308b5cdafa6f238ba50713db6eb9b1a4
BLAKE2b-256 229120e5ad245638b35d9d30ad0183ff0e274742f4c8eb57ed851f4042e15e22

See more details on using hashes here.

File details

Details for the file hexhamming-3.0.0-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for hexhamming-3.0.0-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 3c77d75f279b2aba4446ac851d9a8a80f64c957cf60e98ac12c89806d90b572e
MD5 bfe1ef8184e0167433619bc3e32f33d7
BLAKE2b-256 1b7a158285fb66ebe300bb01e062a9cf9b27884804c2ef5ed6394734cae9e9bc

See more details on using hashes here.

File details

Details for the file hexhamming-3.0.0-cp314-cp314t-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for hexhamming-3.0.0-cp314-cp314t-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 f86325ec22624eff47c7aa5b32d971d05bb0030e790d90c30b70cd961b1abaab
MD5 058cfd620a44d411c8a1738c72fb0039
BLAKE2b-256 f12a01be74b1b1e24147f1661d55a6b8425b76cc200f281839f555feca7b1f44

See more details on using hashes here.

File details

Details for the file hexhamming-3.0.0-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for hexhamming-3.0.0-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 cbab3b44b22b3890cdb06483bc35053c59bf045c9868a0afc32aadfdf361d96e
MD5 ecced50faa6466dbf1898dbf78ab1daa
BLAKE2b-256 93fd904e8d3a2ce5a485aad649a485932f0eb3e2630427114ee4f8bba80c1124

See more details on using hashes here.

File details

Details for the file hexhamming-3.0.0-cp314-cp314-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for hexhamming-3.0.0-cp314-cp314-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 f56fe7f65288abcc4a8519e866d8a7806f5e68b44eb82ccd068369e86f95a53d
MD5 1b8daf98d18ea7e9e3f2ccc60ca07eae
BLAKE2b-256 3f44edebe44074cc8c2715b13fefaff95a86385c425c03973621df66cf8c7a2d

See more details on using hashes here.

File details

Details for the file hexhamming-3.0.0-cp314-cp314-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for hexhamming-3.0.0-cp314-cp314-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 3781ddd4f7b1d8357efaf8c34da454e5757a702983f64f4a5e022c8d3770657f
MD5 9fcb6fe4fa2ccd91f5301964321575b0
BLAKE2b-256 e5e31f5689fa63ee918670424c735535410e51fd6fd2d2c0607f91df59ce638a

See more details on using hashes here.

File details

Details for the file hexhamming-3.0.0-cp314-cp314-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for hexhamming-3.0.0-cp314-cp314-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 79e6028ce7755f119eab870f6df3b0f80ae3d63cdb1422ed7dfd75ecba4dc282
MD5 232f94eed1b27c15fdd6d88f7d60ab0b
BLAKE2b-256 50ddf17cbaf3700f8e2482ff9f31647e152715f75e71436bb428254cc2ee8b23

See more details on using hashes here.

File details

Details for the file hexhamming-3.0.0-cp313-cp313t-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for hexhamming-3.0.0-cp313-cp313t-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 37122607ab42b139d2f016207a6c97fe1c6fa3d6f1c1589e40d7222ad52c338f
MD5 1a5af557ae437bcdf30260c862a1cff0
BLAKE2b-256 5c41f28a8a23c3eb390dc84ecd9a5f0d9838155b7724b17bdf579e7cc5a4b294

See more details on using hashes here.

File details

Details for the file hexhamming-3.0.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for hexhamming-3.0.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 d3af7d5e8519d1a01c12002e755d6144a11f12cc49d9fe16127e468c3cbd91f5
MD5 15c840236d0232f65cf098a175964b95
BLAKE2b-256 6a04a964ab5c50e9126b920183266cfe034180e17e98121f7db715607d33c2d5

See more details on using hashes here.

File details

Details for the file hexhamming-3.0.0-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for hexhamming-3.0.0-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 8c8711c0d4cbcf2be2aec4decbd55f9768729bb53c5a1b4db32f23ef23bca802
MD5 c00caa9add6f5b1c03240ef8691a43f9
BLAKE2b-256 77ec053ecd93812590e50ffd1ecab400a5e95a7e98b4d506d571d569d1a32f1d

See more details on using hashes here.

File details

Details for the file hexhamming-3.0.0-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for hexhamming-3.0.0-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 152044d5e255e10cc4027167eeab81ce8fef2a700c42cc5f7983e4838b8d224e
MD5 52ee0f6f8615f0fcbd1f0bde9943a95e
BLAKE2b-256 86f5ad25bab5ed4708c34a5b43cbb209f48b7dcbaab8dac42a2bd3eccbcd9b8d

See more details on using hashes here.

File details

Details for the file hexhamming-3.0.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for hexhamming-3.0.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 935d66d23f157ec041d420cc01c5c3526c3feeffdf539fd935f19406ea0c656b
MD5 a2bc7654d862627b85732bb48f8cc85d
BLAKE2b-256 7b3e9bcc89f4ff95fc713cf915d4058f5a73ed09b68429a2648ba069f6d855ad

See more details on using hashes here.

File details

Details for the file hexhamming-3.0.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for hexhamming-3.0.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 d1480266d94310a8559b39d134c54377611a434cbef1de9cd9c9177f463093b3
MD5 d47ff5588bdf3604d1c9cb7dd814cdde
BLAKE2b-256 fa67553d8f83c5552978d3546206b165ba6227f6db6940fa745510bb614c2e6e

See more details on using hashes here.

File details

Details for the file hexhamming-3.0.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for hexhamming-3.0.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 8a4038ab7df534e4919eb384ec7a27d9f39012664d7ee795892b34535c2214c6
MD5 511f454df03164bb849c47255b7d6bad
BLAKE2b-256 d82c2bf88b054d5820d390c3c786384d4c8b91bcb425c821ef0c25b5a3cb4b13

See more details on using hashes here.

File details

Details for the file hexhamming-3.0.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for hexhamming-3.0.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 10d632625c9d135e45ae258d3c5a5a5c8385b29a5b8f10914fab08d1c98dc3c0
MD5 3c4ab83140c1a89ade78af271933d6e4
BLAKE2b-256 200dbbf9ff170e6ef1986bbc2995c0e8abd28b0d6f46c189acd3eb5083351620

See more details on using hashes here.

File details

Details for the file hexhamming-3.0.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for hexhamming-3.0.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 77d9b6b4d04c3d61c330b5bc440aa73f418eee0c7b52214177190da746a867a8
MD5 1313beca1dfbf8735cf72933aaa0b355
BLAKE2b-256 ce9007572e283d63091524786138a1157af22aa087acc295ad70065c6791b4e2

See more details on using hashes here.

File details

Details for the file hexhamming-3.0.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for hexhamming-3.0.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 7569813d541f086d45bc945df6c493d513c2005f1e14432653ce664a62c71395
MD5 c7ddff21d784f1015ac09cddcb78e1f5
BLAKE2b-256 2d71a0aaed192addda31251b7965c2944428435e4f76171cd5fe30dbb685fe53

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page