Cross-port-stable extensions to Python imagehash, mirroring the 5 ports of rosetta-squint-hash
Project description
rosetta_squint_hash — Python port (extensions over imagehash)
A thin wrapper around the Python imagehash package (4.3.2+) that adds the cross-port-stable extensions implemented across the 5 ports of rosetta-squint-hash (Rust, Go, Java, JS/TS, Swift).
The v1 surface that's actually different from upstream imagehash is one function: whash_db4_robust. Everything else (phash, dhash, whash(mode='haar'), whash(mode='db4'), phash_simple, dhash_vertical, colorhash, crop_resistant_hash, hex round-trip) is re-exported from upstream unchanged.
Why this package exists
Python imagehash.whash(mode='db4') and our 5 ports' whashDb4 produce the same hash for ~93% of test fixtures. The other ~7% are pathological synthetic inputs (checkerboards, high-contrast geometric icons) where the algorithm computes a wavelet LL band whose values are mathematically exactly 0 — but float64 rounding produces tiny noise (~1e-17). The "is coef > median" comparison then depends on which side of zero the noise lands, which depends on the order of float additions, which depends on SIMD/FMA and compiler.
PyWavelets' C+SIMD inner loop, Rust's f64 accumulation, Java's double loops, Go's column-first traversal, JS's Number arithmetic, and Swift's Double accumulation can disagree by ~10 bits out of 256 on those inputs. Every implementation is "right" per IEEE 754 — the input is at a tie point.
whash_db4_robust resolves this by snapping |coef| < 1e-12 to exactly 0 before the median + threshold step. All implementations now agree (cross-port stable) at the cost of those specific inputs producing different hex than imagehash.whash(mode='db4').
For real photos this never triggers; both functions return identical output.
Install
pip install -e . # from python/
Not yet on PyPI.
Version policy
The wrapper pins upstream dependencies to the exact versions our goldens were validated against:
| Dep | Pinned to | Why |
|---|---|---|
imagehash |
==4.3.2 |
Algorithm output is what generates spec/goldens.json. New upstream release → potential drift → release of rosetta-squint-hash after re-validation. |
Pillow |
==10.4.* |
PIL's Lanczos resize, Image.crop rounding, ImageFilter.GaussianBlur box-radius formula, and grayscale conversion are all involved in our hash pipeline. A Pillow major-version bump can change any of these and silently break parity with the 5 other ports. |
PyWavelets |
>=1.5,<2.0 |
db4 filter coefficients are mathematical constants; less drift risk, but bound the major version. |
numpy |
>=1.26,<2.0 |
NumPy 2.0 changed some default dtypes and behaviors; pin to 1.x. |
This is intentional. The upstream-tracker workflow (.github/workflows/upstream-tracker.yml) catches new upstream releases weekly and surfaces the goldens diff so a human can decide whether to bump and re-release.
If you need to share a Python environment with packages requiring different Pillow / imagehash versions, install with the unpinned extra:
pip install rosetta-squint-hash[unpinned]
Caveat: with [unpinned] the cross-port byte-exact guarantee is no longer enforced by the wrapper — verify your own output matches spec/goldens.json before relying on it. The 5 non-Python ports are unaffected (they don't link to Pillow at all).
Usage
import rosetta_squint_hash as rih
from PIL import Image
img = Image.open("photo.jpg")
# Drop-in upstream imagehash:
h_strict = rih.whash(img, mode="db4") # matches imagehash exactly
# Our cross-port-stable variant:
h_robust = rih.whash_db4_robust(img, hash_size=8)
print(str(h_strict)) # hex, identical to imagehash.whash output
print(str(h_robust)) # hex, identical to Rust/Go/Java/JS/Swift output
# On a real photo, the two agree:
assert str(h_strict) == str(h_robust)
# On a checkerboard, they will differ:
checker = Image.open("spec/fixtures/checker-256.png")
print(rih.whash(checker, mode="db4")) # SIMD-dependent
print(rih.whash_db4_robust(checker, hash_size=8)) # same on every platform
When to pick which
| Use case | Function |
|---|---|
Matching against existing hashes stored from Python imagehash |
rih.whash(img, mode="db4") |
| Cross-language storage/lookup where every port (Python, Rust, Go, Java, JS, Swift) must produce the same hex for the same input | rih.whash_db4_robust(img) |
| Hashing photographs (untrusted or trusted) | Either; output is identical for non-pathological inputs |
| Hashing synthetic test inputs or geometric patterns | rih.whash_db4_robust if cross-port matters |
API
def whash_db4_robust(
image: PIL.Image.Image,
hash_size: int = 8,
image_scale: int | None = None,
) -> imagehash.ImageHash:
"""Cross-port-stable variant of imagehash.whash(mode='db4').
See docstring in rosetta_squint_hash/_impl.py for the full algorithm.
"""
WHASH_DB4_ROBUST_EPS: float = 1e-12 # the ε threshold; fixed across ports
All other names re-exported from imagehash are accessible as rosetta_squint_hash.<name>.
Testing
cd python
pip install -e ".[test]"
pytest
Tests verify whash_db4_robust output against spec/goldens.json — the same goldens the 5 non-Python ports test against. The Python reference produces the goldens, so this is a self-consistency check; the value comes from confirming the per-port implementations match.
License
BSD-2-Clause.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rosetta_squint_hash-1.0.0.tar.gz.
File metadata
- Download URL: rosetta_squint_hash-1.0.0.tar.gz
- Upload date:
- Size: 15.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b1f15ed183efdccbe533b130b559f6ea2e30e706bc21a8b921447be7e36c45a9
|
|
| MD5 |
d56da82d7d807a5d8470a579e178186e
|
|
| BLAKE2b-256 |
7113cdc098b50d1c719de4d0a4dfa10e265a9687ed2a3d9a525d908544ab14a5
|
File details
Details for the file rosetta_squint_hash-1.0.0-py3-none-any.whl.
File metadata
- Download URL: rosetta_squint_hash-1.0.0-py3-none-any.whl
- Upload date:
- Size: 10.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5040571a0fe7f0ff152d188977a4b83b0ee977ab9ccf7cd6529d11ea4ff837f1
|
|
| MD5 |
73430eff975c9ac8f2ec578f03607142
|
|
| BLAKE2b-256 |
37a436569e32e37feed51cc9d47c6eba3638d0bacbd0ea6e0b6c733b3b06b617
|