Skip to main content

Cross-port-stable extensions to Python imagehash, mirroring the 5 ports of rosetta-squint-hash

Project description

rosetta_squint_hash — Python port (extensions over imagehash)

A thin wrapper around the Python imagehash package (4.3.2+) that adds the cross-port-stable extensions implemented across the 5 ports of rosetta-squint-hash (Rust, Go, Java, JS/TS, Swift).

The v1 surface that's actually different from upstream imagehash is one function: whash_db4_robust. Everything else (phash, dhash, whash(mode='haar'), whash(mode='db4'), phash_simple, dhash_vertical, colorhash, crop_resistant_hash, hex round-trip) is re-exported from upstream unchanged.

Why this package exists

Python imagehash.whash(mode='db4') and our 5 ports' whashDb4 produce the same hash for ~93% of test fixtures. The other ~7% are pathological synthetic inputs (checkerboards, high-contrast geometric icons) where the algorithm computes a wavelet LL band whose values are mathematically exactly 0 — but float64 rounding produces tiny noise (~1e-17). The "is coef > median" comparison then depends on which side of zero the noise lands, which depends on the order of float additions, which depends on SIMD/FMA and compiler.

PyWavelets' C+SIMD inner loop, Rust's f64 accumulation, Java's double loops, Go's column-first traversal, JS's Number arithmetic, and Swift's Double accumulation can disagree by ~10 bits out of 256 on those inputs. Every implementation is "right" per IEEE 754 — the input is at a tie point.

whash_db4_robust resolves this by snapping |coef| < 1e-12 to exactly 0 before the median + threshold step. All implementations now agree (cross-port stable) at the cost of those specific inputs producing different hex than imagehash.whash(mode='db4').

For real photos this never triggers; both functions return identical output.

Install

pip install -e .              # from python/

Not yet on PyPI.

Version policy

The wrapper pins upstream dependencies to the exact versions our goldens were validated against:

Dep Pinned to Why
imagehash ==4.3.2 Algorithm output is what generates spec/goldens.json. New upstream release → potential drift → release of rosetta-squint-hash after re-validation.
Pillow ==10.4.* PIL's Lanczos resize, Image.crop rounding, ImageFilter.GaussianBlur box-radius formula, and grayscale conversion are all involved in our hash pipeline. A Pillow major-version bump can change any of these and silently break parity with the 5 other ports.
PyWavelets >=1.5,<2.0 db4 filter coefficients are mathematical constants; less drift risk, but bound the major version.
numpy >=1.26,<2.0 NumPy 2.0 changed some default dtypes and behaviors; pin to 1.x.

This is intentional. The upstream-tracker workflow (.github/workflows/upstream-tracker.yml) catches new upstream releases weekly and surfaces the goldens diff so a human can decide whether to bump and re-release.

If you need to share a Python environment with packages requiring different Pillow / imagehash versions, install with the unpinned extra:

pip install rosetta-squint-hash[unpinned]

Caveat: with [unpinned] the cross-port byte-exact guarantee is no longer enforced by the wrapper — verify your own output matches spec/goldens.json before relying on it. The 5 non-Python ports are unaffected (they don't link to Pillow at all).

Usage

import rosetta_squint_hash as rih
from PIL import Image

img = Image.open("photo.jpg")

# Drop-in upstream imagehash:
h_strict = rih.whash(img, mode="db4")            # matches imagehash exactly

# Our cross-port-stable variant:
h_robust = rih.whash_db4_robust(img, hash_size=8)

print(str(h_strict))                              # hex, identical to imagehash.whash output
print(str(h_robust))                              # hex, identical to Rust/Go/Java/JS/Swift output

# On a real photo, the two agree:
assert str(h_strict) == str(h_robust)

# On a checkerboard, they will differ:
checker = Image.open("spec/fixtures/checker-256.png")
print(rih.whash(checker, mode="db4"))             # SIMD-dependent
print(rih.whash_db4_robust(checker, hash_size=8)) # same on every platform

When to pick which

Use case Function
Matching against existing hashes stored from Python imagehash rih.whash(img, mode="db4")
Cross-language storage/lookup where every port (Python, Rust, Go, Java, JS, Swift) must produce the same hex for the same input rih.whash_db4_robust(img)
Hashing photographs (untrusted or trusted) Either; output is identical for non-pathological inputs
Hashing synthetic test inputs or geometric patterns rih.whash_db4_robust if cross-port matters

API

def whash_db4_robust(
    image: PIL.Image.Image,
    hash_size: int = 8,
    image_scale: int | None = None,
) -> imagehash.ImageHash:
    """Cross-port-stable variant of imagehash.whash(mode='db4').

    See docstring in rosetta_squint_hash/_impl.py for the full algorithm.
    """

WHASH_DB4_ROBUST_EPS: float = 1e-12  # the ε threshold; fixed across ports

All other names re-exported from imagehash are accessible as rosetta_squint_hash.<name>.

Testing

cd python
pip install -e ".[test]"
pytest

Tests verify whash_db4_robust output against spec/goldens.json — the same goldens the 5 non-Python ports test against. The Python reference produces the goldens, so this is a self-consistency check; the value comes from confirming the per-port implementations match.

License

BSD-2-Clause.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rosetta_squint_hash-1.0.0.tar.gz (15.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rosetta_squint_hash-1.0.0-py3-none-any.whl (10.7 kB view details)

Uploaded Python 3

File details

Details for the file rosetta_squint_hash-1.0.0.tar.gz.

File metadata

  • Download URL: rosetta_squint_hash-1.0.0.tar.gz
  • Upload date:
  • Size: 15.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for rosetta_squint_hash-1.0.0.tar.gz
Algorithm Hash digest
SHA256 b1f15ed183efdccbe533b130b559f6ea2e30e706bc21a8b921447be7e36c45a9
MD5 d56da82d7d807a5d8470a579e178186e
BLAKE2b-256 7113cdc098b50d1c719de4d0a4dfa10e265a9687ed2a3d9a525d908544ab14a5

See more details on using hashes here.

File details

Details for the file rosetta_squint_hash-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for rosetta_squint_hash-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5040571a0fe7f0ff152d188977a4b83b0ee977ab9ccf7cd6529d11ea4ff837f1
MD5 73430eff975c9ac8f2ec578f03607142
BLAKE2b-256 37a436569e32e37feed51cc9d47c6eba3638d0bacbd0ea6e0b6c733b3b06b617

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page