Skip to main content

Constant-time anagram hashing using packed bit-width representation

Project description

Packed Anagram Hash

Constant-time anagram comparison using packed bit-width representation.

DOI PyPI version License: CC0-1.0

Why?

Standard anagram detection methods have hidden costs:

Method Hash Cost Compare Cost Key Size
Sorted String O(n log n) O(n) n bytes
Count Array O(n) O(26) 104 bytes
Packed Bit-Width O(n) O(1) 8 bytes

This package provides 26× faster comparison than count arrays and n× faster than sorted strings, with 6-13× smaller keys.

Installation

pip install packed-anagram-hash

Quick Start

from packed_anagram_hash import PackedAnagramHasher

# Initialize with your corpus
corpus = ["store", "rotes", "tores", "stare", "rates", "tears"]
hasher = PackedAnagramHasher(corpus)

# Hash words - anagrams produce identical hashes
h1 = hasher.hash("store")
h2 = hasher.hash("rotes")
h3 = hasher.hash("stare")

assert h1 == h2  # True - same letters
assert h1 != h3  # True - different letters

# Group anagrams
groups = hasher.group_anagrams(corpus)
# {hash1: ["store", "rotes", "tores"], hash2: ["stare", "rates", "tears"]}

How It Works

  1. Pre-compute maximum letter frequency per character in your corpus
  2. Allocate minimal bits per letter (e.g., 'e' max 4 occurrences → 3 bits)
  3. Pack into single 64-bit register (~50-60 bits for English)
  4. Hash by accumulating: h += (1 << offset[letter])

Addition commutes, so anagrams produce identical hashes. Comparison is single CPU instruction.

API Reference

PackedAnagramHasher(corpus)

Initialize hasher with a corpus to determine bit-widths.

Parameters:

  • corpus: Iterable of strings to analyze for letter frequencies

Attributes:

  • bit_widths: List of bits allocated per letter (a-z)
  • offsets: Bit offset for each letter in the packed hash
  • total_bits: Total bits used (must be ≤64 for single-register operation)

hasher.hash(word) -> int

Compute packed hash for a word.

Parameters:

  • word: String to hash (case-insensitive, non-alphabetic characters ignored)

Returns:

  • 64-bit integer hash

hasher.group_anagrams(words) -> dict[int, list[str]]

Group words by anagram equivalence.

Parameters:

  • words: Iterable of strings to group

Returns:

  • Dictionary mapping hash → list of anagram words

hasher.are_anagrams(word1, word2) -> bool

Check if two words are anagrams.

Advanced: Multi-Register Extension

For corpora requiring >64 bits (extremely long words or large alphabets), the algorithm extends to multiple registers. See the paper for details.

Citation

If you use this in academic work:

@software{brown_2025_packed_anagram,
  author       = {Brown, Nicholas David},
  title        = {Packed Bit-Width Anagram Hashing: A Constant-Time 
                  Comparison Algorithm for Anagram Equivalence},
  year         = 2025,
  publisher    = {Zenodo},
  doi          = {10.5281/zenodo.18167975},
  url          = {https://doi.org/10.5281/zenodo.18167975}
}

License

CC0 1.0 Universal - Public Domain. Use freely.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

packed_anagram_hash-1.0.0.tar.gz (7.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

packed_anagram_hash-1.0.0-py3-none-any.whl (7.8 kB view details)

Uploaded Python 3

File details

Details for the file packed_anagram_hash-1.0.0.tar.gz.

File metadata

  • Download URL: packed_anagram_hash-1.0.0.tar.gz
  • Upload date:
  • Size: 7.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for packed_anagram_hash-1.0.0.tar.gz
Algorithm Hash digest
SHA256 62079f549cf601b27e6532e1240d75a3950619631ef4abf7015cdbb3a546118c
MD5 2736125fc12bb67f49e782ac18055433
BLAKE2b-256 3c3e0cd36858d46ce9852cdc5803107a636e3988c7d136a7be38b9e9ec60dff4

See more details on using hashes here.

File details

Details for the file packed_anagram_hash-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for packed_anagram_hash-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 36dd0ecaf3aa4a6d7c430c9a2d1afb6febf4b8a2c8648c2304184dc37a01567d
MD5 2d23903d02e3a3427bdf1ed1ef7f466a
BLAKE2b-256 45e9eae4c42c237cae94c74019c244fe8d95ceb7a67e76031a535e56772ae48b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page