Skip to main content

A very compact representation of an image placeholder (thumbhash, RGBA-safe fork)

Project description

thash

A modern Python port of the ThumbHash encoder by Evan Wallace. ThumbHash represents an image as ~20 bytes — small enough to inline in HTML, large enough to render a recognizable color/aspect placeholder before the real image loads.

This is an independently published fork of thumbhash by Justin Forlenza. Notable changes vs. upstream:

  • Alpha-channel crash fixed (operator-precedence bug in rgba_to_thumb_hash — see upstream issue #1).
  • NumPy-accelerated backend with cached cosine basis and float32 DCT (~100–140× faster than the reference implementation, byte-identical output).
  • High-level encode() API that accepts paths, bytes, PIL images, NumPy arrays, and OpenCV BGR arrays — pick the input you already have, no boilerplate.
  • Configurable target_size so you can trade hash quality for encoding speed.

Installation

# Pure-Python fallback only (no deps)
pip install thash

# Recommended runtime (NumPy fast path + Pillow decoding)
pip install thash[all]

If you use uv:

uv add thash --extra all

Requires Python ≥ 3.10.

Quick start

The high-level API takes pretty much any image-shaped thing:

from thash import encode

# From a file path or URL-fetched bytes
hash_bytes = encode("photo.jpg")
hash_bytes = encode(open("photo.jpg", "rb").read())

# From a PIL image (already in memory, no re-decode)
from PIL import Image
hash_bytes = encode(Image.open("photo.jpg"))

# From a NumPy array (H,W,3) or (H,W,4) — assumed RGB/RGBA
import numpy as np
arr = np.asarray(Image.open("photo.jpg"))
hash_bytes = encode(arr)

# From an OpenCV BGR array
import cv2
bgr = cv2.imread("photo.jpg")
hash_bytes = encode(bgr, color_order="BGR")

# Grayscale / float arrays in [0, 1] also work — they're normalized for you
hash_bytes = encode(arr.astype(np.float32) / 255.0)

Decoding the hash back

from thash import thumb_hash_to_average_rgba, thumb_hash_to_approximate_aspect_ratio

r, g, b, a = thumb_hash_to_average_rgba(hash_bytes)   # values in [0, 1]
aspect = thumb_hash_to_approximate_aspect_ratio(hash_bytes)  # w / h

(For full decoding back to pixels, see the JS reference impl — only encoding is implemented here.)

Tuning speed vs. quality

target_size controls the longer dimension of the image after thumbnail (spec max is 100). Smaller = faster, lower fidelity:

target_size DCT time Visual quality
100 (default) ~125 μs Reference / spec-compatible
64 ~85 μs Indistinguishable in practice
50 ~75 μs Fine for any placeholder use
32 ~65 μs Colors correct, details blurred
16 ~45 μs Average color + rough orientation only
encode("photo.jpg", target_size=50)         # 4× DCT speedup, hash is still spec-valid
encode("photo.jpg", target_size=50, resize=False)  # error if image is already > 50px

Note: For very large input images the bottleneck is usually PIL decode + resize, not the DCT. target_size only matters once your input is already small (e.g. a tensor in an ML pipeline). For batch processing many photos from disk, parallelize with concurrent.futures.ProcessPoolExecutor before reaching for GPU.

Backends

The package picks the NumPy backend at import time if available, otherwise falls back to a pure-Python reference implementation. You can force one explicitly:

encode(img, backend="numpy")    # default, BLAS-accelerated matmul
encode(img, backend="pure")     # reference Python, no deps

Backend availability is reflected by module flags:

from thash import has_numpy, has_pil

Backend comparison (random RGBA inputs, byte-identical output)

case                 size alpha         pure        numpy
---------------------------------------------------------
tiny-square       10x10   False     300 μs        41 μs
small-square      32x32   False     2.7 ms        66 μs
medium-square     64x64   False    11.4 ms        86 μs
max-square       100x100  False    26.8 ms       124 μs
landscape        100x56   False    11.7 ms        98 μs
max-square+a     100x100   True    28.2 ms       168 μs
HD-720p         1280x720  False        —          48 ms
FHD-1080p       1920x1080 False        —         208 ms
UHD-4K          3840x2160 False        —         516 ms

NumPy is ~100–140× faster than the reference impl on spec-sized inputs (geometric mean ~88×, median ~137×). Three optimizations stack here:

  1. Cosine basis cached by (n, k)np.cos cost amortizes across calls with shared dimensions (common after thumbnail).
  2. P and Q channels combined into a single batched 3×3 matmul.
  3. float32 DCT — Bandwidth halved, BLAS sgemm faster than dgemm; verified byte-identical on 490 random inputs across all spec shapes.

The pure-Python fallback is kept so the package works with zero deps. Run uv run python benchmarks/run.py to reproduce.

Low-level API

The original byte-list API still works for callers who want to manage RGBA themselves:

from thash import rgba_to_thumb_hash, image_to_thumb_hash

# Flat list: [R, G, B, A, R, G, B, A, ...], length = 4 * w * h
hash_bytes = rgba_to_thumb_hash(width, height, flat_rgba_ints)

# Open a file via Pillow, thumbnail to ≤100x100, encode
hash_bytes = image_to_thumb_hash("photo.jpg")

rgba_to_thumb_hash automatically picks the NumPy backend if available, falling back to pure Python otherwise.

Development

git clone https://github.com/Jannchie/thumbhash-py.git
cd thumbhash-py
uv sync --all-extras --all-groups   # full dev env (deps + dev tools + bench)

uv run pytest                       # tests
uv run ruff check thash benchmarks  # lint
uv run python benchmarks/run.py     # benchmark suite

Credits

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

thash-1.1.0.tar.gz (12.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

thash-1.1.0-py3-none-any.whl (12.5 kB view details)

Uploaded Python 3

File details

Details for the file thash-1.1.0.tar.gz.

File metadata

  • Download URL: thash-1.1.0.tar.gz
  • Upload date:
  • Size: 12.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for thash-1.1.0.tar.gz
Algorithm Hash digest
SHA256 030db7d51572166ff4f9856c65d4fc5c0ef7e24c2fe7934992d88721ed151c5b
MD5 f4722d3c3a8428e95fd70d5e1bfd903b
BLAKE2b-256 4cd236c66a6214a5802655f1a1d4c51e5b8f123492fba96be40091e986692204

See more details on using hashes here.

File details

Details for the file thash-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: thash-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 12.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for thash-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1f260ea3c9202b71b8971da48a0488690998cf06a18b13fd02383ae9303f4850
MD5 d161a24bcaba087dc19c03aad7a84519
BLAKE2b-256 82b598633524b84be4a9e96f371d3a7e5d42cee14ce9f98bce292ba140a93c4d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page