Constant-time anagram hashing using packed bit-width representation
Project description
Packed Anagram Hash
Constant-time anagram comparison using packed bit-width representation.
Why?
Standard anagram detection methods have hidden costs:
| Method | Hash Cost | Compare Cost | Key Size |
|---|---|---|---|
| Sorted String | O(n log n) | O(n) | n bytes |
| Count Array | O(n) | O(26) | 104 bytes |
| Packed Bit-Width | O(n) | O(1) | 8 bytes |
This package provides 26× faster comparison than count arrays and n× faster than sorted strings, with 6-13× smaller keys.
Installation
pip install packed-anagram-hash
Quick Start
from packed_anagram_hash import PackedAnagramHasher
# Initialize with your corpus
corpus = ["store", "rotes", "tores", "stare", "rates", "tears"]
hasher = PackedAnagramHasher(corpus)
# Hash words - anagrams produce identical hashes
h1 = hasher.hash("store")
h2 = hasher.hash("rotes")
h3 = hasher.hash("stare")
assert h1 == h2 # True - same letters
assert h1 != h3 # True - different letters
# Group anagrams
groups = hasher.group_anagrams(corpus)
# {hash1: ["store", "rotes", "tores"], hash2: ["stare", "rates", "tears"]}
How It Works
- Pre-compute maximum letter frequency per character in your corpus
- Allocate minimal bits per letter (e.g., 'e' max 4 occurrences → 3 bits)
- Pack into single 64-bit register (~50-60 bits for English)
- Hash by accumulating:
h += (1 << offset[letter])
Addition commutes, so anagrams produce identical hashes. Comparison is single CPU instruction.
API Reference
PackedAnagramHasher(corpus)
Initialize hasher with a corpus to determine bit-widths.
Parameters:
corpus: Iterable of strings to analyze for letter frequencies
Attributes:
bit_widths: List of bits allocated per letter (a-z)offsets: Bit offset for each letter in the packed hashtotal_bits: Total bits used (must be ≤64 for single-register operation)
hasher.hash(word) -> int
Compute packed hash for a word.
Parameters:
word: String to hash (case-insensitive, non-alphabetic characters ignored)
Returns:
- 64-bit integer hash
hasher.group_anagrams(words) -> dict[int, list[str]]
Group words by anagram equivalence.
Parameters:
words: Iterable of strings to group
Returns:
- Dictionary mapping hash → list of anagram words
hasher.are_anagrams(word1, word2) -> bool
Check if two words are anagrams.
Advanced: Multi-Register Extension
For corpora requiring >64 bits (extremely long words or large alphabets), the algorithm extends to multiple registers. See the paper for details.
Citation
If you use this in academic work:
@software{brown_2025_packed_anagram,
author = {Brown, Nicholas David},
title = {Packed Bit-Width Anagram Hashing: A Constant-Time
Comparison Algorithm for Anagram Equivalence},
year = 2025,
publisher = {Zenodo},
doi = {10.5281/zenodo.18168195},
url = {https://doi.org/10.5281/zenodo.18168195}
}
License
CC0 1.0 Universal - Public Domain. Use freely.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file packed_anagram_hash-1.0.1.tar.gz.
File metadata
- Download URL: packed_anagram_hash-1.0.1.tar.gz
- Upload date:
- Size: 7.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2453cc172e23efb50d616c677dd8e4609dd2c3dde2c8efd39a4e6443cc4f3922
|
|
| MD5 |
73e01f5eeb900fb45ecb8d0862d41fe8
|
|
| BLAKE2b-256 |
349348f434f51097eed2a17e21f14a2fe857793992e5b4562ff52f020f2429e5
|
File details
Details for the file packed_anagram_hash-1.0.1-py3-none-any.whl.
File metadata
- Download URL: packed_anagram_hash-1.0.1-py3-none-any.whl
- Upload date:
- Size: 7.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cc51459f99896938ec7af6f42475ca52ef6370172afbdeabd5e0be9bddf04db5
|
|
| MD5 |
162214744dbe5f2d991b11bddeafb1e2
|
|
| BLAKE2b-256 |
080b956464806c7b91c11c541143faf26da0d264b0ffc3d95d351d11c24a1586
|