Skip to main content

Constant-time anagram hashing using packed bit-width representation

Project description

Packed Anagram Hash

Constant-time anagram comparison using packed bit-width representation.

DOI PyPI version License: CC0-1.0

Why?

Standard anagram detection methods have hidden costs:

Method Hash Cost Compare Cost Key Size
Sorted String O(n log n) O(n) n bytes
Count Array O(n) O(26) 104 bytes
Packed Bit-Width O(n) O(1) 8 bytes

This package provides 26× faster comparison than count arrays and n× faster than sorted strings, with 6-13× smaller keys.

Installation

pip install packed-anagram-hash

Quick Start

from packed_anagram_hash import PackedAnagramHasher

# Initialize with your corpus
corpus = ["store", "rotes", "tores", "stare", "rates", "tears"]
hasher = PackedAnagramHasher(corpus)

# Hash words - anagrams produce identical hashes
h1 = hasher.hash("store")
h2 = hasher.hash("rotes")
h3 = hasher.hash("stare")

assert h1 == h2  # True - same letters
assert h1 != h3  # True - different letters

# Group anagrams
groups = hasher.group_anagrams(corpus)
# {hash1: ["store", "rotes", "tores"], hash2: ["stare", "rates", "tears"]}

How It Works

  1. Pre-compute maximum letter frequency per character in your corpus
  2. Allocate minimal bits per letter (e.g., 'e' max 4 occurrences → 3 bits)
  3. Pack into single 64-bit register (~50-60 bits for English)
  4. Hash by accumulating: h += (1 << offset[letter])

Addition commutes, so anagrams produce identical hashes. Comparison is single CPU instruction.

API Reference

PackedAnagramHasher(corpus)

Initialize hasher with a corpus to determine bit-widths.

Parameters:

  • corpus: Iterable of strings to analyze for letter frequencies

Attributes:

  • bit_widths: List of bits allocated per letter (a-z)
  • offsets: Bit offset for each letter in the packed hash
  • total_bits: Total bits used (must be ≤64 for single-register operation)

hasher.hash(word) -> int

Compute packed hash for a word.

Parameters:

  • word: String to hash (case-insensitive, non-alphabetic characters ignored)

Returns:

  • 64-bit integer hash

hasher.group_anagrams(words) -> dict[int, list[str]]

Group words by anagram equivalence.

Parameters:

  • words: Iterable of strings to group

Returns:

  • Dictionary mapping hash → list of anagram words

hasher.are_anagrams(word1, word2) -> bool

Check if two words are anagrams.

Advanced: Multi-Register Extension

For corpora requiring >64 bits (extremely long words or large alphabets), the algorithm extends to multiple registers. See the paper for details.

Citation

If you use this in academic work:

@software{brown_2025_packed_anagram,
  author       = {Brown, Nicholas David},
  title        = {Packed Bit-Width Anagram Hashing: A Constant-Time 
                  Comparison Algorithm for Anagram Equivalence},
  year         = 2025,
  publisher    = {Zenodo},
  doi          = {10.5281/zenodo.18168195},
  url          = {https://doi.org/10.5281/zenodo.18168195}
}

License

CC0 1.0 Universal - Public Domain. Use freely.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

packed_anagram_hash-1.0.1.tar.gz (7.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

packed_anagram_hash-1.0.1-py3-none-any.whl (7.8 kB view details)

Uploaded Python 3

File details

Details for the file packed_anagram_hash-1.0.1.tar.gz.

File metadata

  • Download URL: packed_anagram_hash-1.0.1.tar.gz
  • Upload date:
  • Size: 7.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for packed_anagram_hash-1.0.1.tar.gz
Algorithm Hash digest
SHA256 2453cc172e23efb50d616c677dd8e4609dd2c3dde2c8efd39a4e6443cc4f3922
MD5 73e01f5eeb900fb45ecb8d0862d41fe8
BLAKE2b-256 349348f434f51097eed2a17e21f14a2fe857793992e5b4562ff52f020f2429e5

See more details on using hashes here.

File details

Details for the file packed_anagram_hash-1.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for packed_anagram_hash-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 cc51459f99896938ec7af6f42475ca52ef6370172afbdeabd5e0be9bddf04db5
MD5 162214744dbe5f2d991b11bddeafb1e2
BLAKE2b-256 080b956464806c7b91c11c541143faf26da0d264b0ffc3d95d351d11c24a1586

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page