Hash rope: a BB[2/7] balanced binary tree with polynomial hash metadata for O(log n) concat, split, substring hashing, and O(log q) repetition encoding.
Project description
hashrope
A BB[2/7] weight-balanced binary tree augmented with polynomial hash metadata at every node.
Zero dependencies. Pure Python. Python 3.10+.
What it does
hashrope is a rope data structure where every node carries a polynomial hash. This enables:
- O(k log w) concat and split with strict BB[2/7] rebalancing via Adams-style balanced join
- O(k log q) repetition encoding via RepeatNode -- represents "repeat this subtree q times" without materializing the data
- O(k log w) substring hashing without allocating new nodes
- O(1) whole-sequence hash at any point during streaming ingestion
- Bounded-memory sliding window that evicts old data while maintaining a running hash of everything seen
All arithmetic uses Mersenne prime modular reduction (bit-shift, no division).
Install
pip install hashrope
Quick start
from hashrope import PolynomialHash, Leaf, rope_concat, rope_split, rope_substr_hash, rope_hash
h = PolynomialHash()
# Build a rope
a = Leaf(b"hello ", h)
b = Leaf(b"world", h)
ab = rope_concat(a, b, h)
# Whole-string hash matches direct computation
assert rope_hash(ab) == h.hash(b"hello world")
# Split at any byte position
left, right = rope_split(ab, 6, h)
# Substring hash without allocation
sub_h = rope_substr_hash(ab, 0, 5, h)
assert sub_h == h.hash(b"hello")
RepeatNode
Encode repetitions in O(log q) time and O(1) space:
from hashrope import PolynomialHash, Leaf, rope_repeat, rope_hash
h = PolynomialHash()
pattern = Leaf(b"ab", h)
repeated = rope_repeat(pattern, 1000, h) # "ab" x 1000
# Hash computed in O(log 1000), not O(2000)
assert rope_hash(repeated) == h.hash(b"ab" * 1000)
SlidingWindow
Bounded-memory streaming hash with copy-from-window support:
from hashrope import PolynomialHash, SlidingWindow
h = PolynomialHash()
sw = SlidingWindow(d_max=1024, m_max=256)
# Stream data in
sw.append_bytes(b"hello ")
sw.append_bytes(b"world")
# Hash of everything seen so far
assert sw.current_hash() == h.hash(b"hello world")
# Copy-from-window (like LZ77 back-references)
sw.append_copy(offset=5, length=5) # copies "world" again
assert sw.final_hash() == h.hash(b"hello worldworld")
API reference
Polynomial hash
| Function / Class | Description |
|---|---|
PolynomialHash(prime, base) |
Hash function over Z/pZ. Default: p = 2^61 - 1, base = 131 |
PolynomialHash.hash(data) |
H(data) with +1 byte offset to prevent zero collisions |
PolynomialHash.hash_concat(h_a, len_b, h_b) |
H(A || B) from H(A), |B|, H(B) in O(1) |
PolynomialHash.hash_repeat(h_s, d, q) |
H(S^q) from H(S) and |S| in O(log q) |
PolynomialHash.hash_overlap(h_p, d, l, h_prefix) |
H(P^q || P[0..r-1]) for overlapping references |
phi(q, alpha, p) |
Geometric accumulator: sum of alpha^i for i in [0, q) in O(log q) |
mersenne_mod(a, p) |
Bit-shift reduction mod Mersenne prime |
mersenne_mul(a, b, p) |
Modular multiply via mersenne_mod |
MERSENNE_61 |
2^61 - 1 (default prime) |
MERSENNE_127 |
2^127 - 1 (high-security prime) |
Rope nodes
| Class | Description |
|---|---|
Leaf(data, h) |
Immutable leaf storing a byte sequence. Weight = 1 |
Internal(left, right, h) |
Internal node. Hash = H(left) * x^len(right) + H(right) |
RepeatNode(child, reps, h) |
Represents child^reps. Hash via geometric accumulator |
Rope operations
| Function | Time | Description |
|---|---|---|
rope_concat(left, right, h) |
O(k |h_L - h_R|) | Concatenate with Adams-style balanced join and BB[2/7] rebalancing |
rope_split(node, pos, h) |
O(k log w) | Split at byte position |
rope_repeat(node, q, h) |
O(k log q) | Create RepeatNode |
rope_substr_hash(node, start, length, h) |
O(k log w) | Substring hash without allocation |
rope_len(node) |
O(1) | Total byte length |
rope_hash(node) |
O(1) | Hash of represented string |
rope_from_bytes(data, h) |
O(n) | Create Leaf from bytes |
rope_to_bytes(node) |
O(n) | Reconstruct bytes (for testing) |
validate_rope(node) |
O(w) | Assert all 9 invariants hold |
SlidingWindow
| Method | Description |
|---|---|
SlidingWindow(d_max, m_max, prime, base) |
Create window. W = d_max + m_max |
.append_bytes(data) |
Append literal bytes |
.append_copy(offset, length) |
Copy from window (handles overlapping) |
.current_hash() |
H(everything seen) in O(1) |
.final_hash() |
Same as current_hash (alias) |
.window_len |
Current window size in bytes |
.pos |
Total bytes ingested |
Mathematical foundation
The data structure and its correctness proofs come from the compressed-domain hashing framework (24 theorems, 14 lemmas, 5 corollaries). Key results used by hashrope:
| Result | Statement |
|---|---|
| Theorem 1 (Concatenation) | H(A || B) = H(A) * x^{|B|} + H(B) |
| Theorem 2 (Repetition) | H(S^q) = H(S) * Phi(q, x^d) |
| Theorem 3 (Phi computation) | Phi(q, alpha) computable in O(log q) without modular inverse |
| Theorem 6 (Join) | Adams-style balanced join with BB[2/7] rebalancing |
| Theorem 7 (Split) | Split in O(k log w) with RepeatNode splittability |
| Theorem 8 (Repeat) | RepeatNode in O(k log q), O(1) space |
| Theorem 9 (SubstrHash) | Allocation-free substring hashing in O(k log w) |
| Theorem 11 (Sliding window) | Bounded-memory streaming with eviction |
Changelog
See CHANGELOG.md for version history and details on the v0.2.1 balance bugfix.
Provenance
Extracted from UHC (Unified Hash-Compression Engine). The rope and polynomial hash modules are verbatim copies with import paths updated. The sliding window is adapted from UHC's SlidingRopeState with the LZ77 dependency removed.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hashrope-0.2.1.tar.gz.
File metadata
- Download URL: hashrope-0.2.1.tar.gz
- Upload date:
- Size: 23.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3d3b2f4a3bd1c025f8334f4f443fabd25c78f44f1e663f788544bd3f0a8240e8
|
|
| MD5 |
ef1c5025a2931d1f2a05542a2556e129
|
|
| BLAKE2b-256 |
90b640f8f9854b29bea97dc2a173d10fed0364f02b426095d2a0d5e11928fb66
|
File details
Details for the file hashrope-0.2.1-py3-none-any.whl.
File metadata
- Download URL: hashrope-0.2.1-py3-none-any.whl
- Upload date:
- Size: 15.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a737ce9ce83ea8452e2cd0e22651ba7f0113339c9d2dbae97283255b9a862300
|
|
| MD5 |
9c8394e42f8e701460ef1902524df0c8
|
|
| BLAKE2b-256 |
47f3fdf15092ec65098bd8054adec6b95c4a7c7804490636fc4ec640da9d4007
|