Skip to main content

Perceptual hash storage protocol for inline-snapshot

Project description

inline-snapshot-phash

uv pdm-managed PyPI Supported Python versions License pre-commit.ci status

Perceptual hash storage protocol for inline-snapshot.

Features

  • Perceptual hashing for content-based addressing: Images are stored and identified by their perceptual hash rather than exact byte matching
  • Automatic deduplication: Perceptually identical images (e.g., same content in different formats) share a single archived file
  • Fast hash comparison: Test runs compare hash strings without loading images from disk
  • Archived files for inspection: Original images remain available for manual visual comparison when outputs change
  • (Future) Tolerance-based comparison: Support for near-matches within a configurable similarity threshold

Installation

pip install inline-snapshot-phash

Requirements

  • Python 3.8+
  • inline-snapshot >= 0.30.1
  • czkawka >= 0.1.1

Quick Start

Register the storage protocol in your conftest.py:

from inline_snapshot_phash import register_phash_storage

register_phash_storage()  # noqa: F401

Then use the phash: protocol in your tests:

from pathlib import Path
from inline_snapshot import external

def test_image_output():
    output_path = generate_diagram()  # Returns Path to a .png file
    assert output_path == external("phash:")

On first run with --inline-snapshot=create, this generates:

def test_image_output():
    output_path = generate_diagram()
    assert output_path == external("phash:8LS0tOSwvLQ.png")

The image is archived at .inline-snapshot/phash/8LS0tOSwvLQ.png, and subsequent test runs compare perceptual hashes without loading the image file.

Demo

A minimal demo test suite is provided in demo/demo_test.py showing the three core behaviors:

  • basic phash snapshot creation
  • different images producing different hashes
    • The test_red_square and test_blue_square tests produce different snapshots.
  • identical images sharing archived storage (one-to-many behavior).
    • The test_red_square and test_red_square_tiny tests produce the same snapshot because the 2px wide square PNG has the same perceptual hash as the 100px one.

Run pytest --inline-snapshot=create demo/demo_test.py to see it in action.

How It Works

Property-Based Similarity

Traditional snapshot testing assumes deterministic processes that produce byte-identical outputs. The phash: protocol instead snapshots based on perceptual similarity—a property of the image content rather than exact byte matching.

For example, if 10 test functions each generate a red square (as PNG, JPG, at different sizes), they all produce the same perceptual hash. One archived image file serves all 10 tests, and hash comparisons pass without redundant storage.

Storage Flow

  1. You write assert output_path == external("phash:")
  2. inline-snapshot computes the perceptual hash of the image at output_path
  3. The code updates to assert output_path == external("phash:8LS0tOSwvLQ.png")
  4. The original image is stored at .inline-snapshot/phash/8LS0tOSwvLQ.png

On subsequent test runs:

  • The perceptual hash of the new output is computed
  • It's compared against 8LS0tOSwvLQ from the snapshot string
  • If they match, the test passes (no file I/O after initial hash computation)
  • If different, inline-snapshot shows a diff and offers to update

Why Both Hash and File?

The hash enables fast comparison during test runs—just string matching, no image loading. The archived file provides a reference for manual visual inspection when test outputs change.

For example, in page dewarping optimization (flattening curved book pages from photos), you want to avoid:

  • Constantly reviewing tests when optimization tweaks change outputs slightly (but imperceptibly)
  • Naively accepting snapshot updates without understanding what changed

The phash approach separates "did perceptual quality change?" (the test assertion) from "what exactly changed?" (manual inspection of archived images).

One-to-Many Behavior

This protocol deliberately deduplicates perceptually similar images. When create_image2() changes, you diff against whichever test first generated that hash (e.g., create_image1()'s archive), not the last run of create_image2().

This is the intended behavior: files with the same phash are treated as identical, similar to git's SHA256 content addressing but for perceptual equivalence. For more discussion on this design decision and use cases, see inline-snapshot discussion #311.

Contributing

Maintained by lmmx. Contributions welcome!

  1. Issues & Discussions: Please open a GitHub issue for bugs or feature requests. For design discussions, see the upstream inline-snapshot discussion #311.
  2. Pull Requests: PRs are welcome!
    • Install the dev environment with uv: uv sync
    • Run tests with $(uv python find) -m pytest and include updates to docs or examples if relevant.
    • If reporting a bug, please include the version and the error message/traceback if available.

This is a third-party extension for inline-snapshot.

License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

inline_snapshot_phash-0.1.0.tar.gz (6.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

inline_snapshot_phash-0.1.0-py3-none-any.whl (6.8 kB view details)

Uploaded Python 3

File details

Details for the file inline_snapshot_phash-0.1.0.tar.gz.

File metadata

  • Download URL: inline_snapshot_phash-0.1.0.tar.gz
  • Upload date:
  • Size: 6.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: pdm/2.26.0 CPython/3.14.0rc2 Linux/6.8.0-57-generic

File hashes

Hashes for inline_snapshot_phash-0.1.0.tar.gz
Algorithm Hash digest
SHA256 1e503295cccb46672983e7e2f0e9155f68b9fe265a3ea927eafe38bfc0adc534
MD5 4d2830f33c4034f52372fd17cfe5c7f0
BLAKE2b-256 b75e3f565b89a246275d06c64467969a338667520e35a57147abb4a784aa5c22

See more details on using hashes here.

File details

Details for the file inline_snapshot_phash-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for inline_snapshot_phash-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ca5cccb74959ce35f560d62e9e2a6890adf0086a14765124e8996a4772547bf9
MD5 c06a25e04ab7bf9eae6be9b230891063
BLAKE2b-256 2ae8aeb7844b3d0a4d0be6da6d8b6040ecc2444b7154fa426578b346d97f0b61

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page