Skip to main content

A very compact representation of an image placeholder (thumbhash, RGBA-safe fork)

Project description

thash

A modern Python port of the ThumbHash encoder by Evan Wallace. ThumbHash represents an image as ~20 bytes — small enough to inline in HTML, large enough to render a recognizable color/aspect placeholder before the real image loads.

This is an independently published fork of thumbhash by Justin Forlenza. Notable changes vs. upstream:

  • Alpha-channel crash fixed (operator-precedence bug in rgba_to_thumb_hash — see upstream issue #1).
  • NumPy-accelerated backend with cached cosine basis and float32 DCT (~100–140× faster than the reference implementation, byte-identical output).
  • High-level encode() API that accepts paths, bytes, PIL images, NumPy arrays, and OpenCV BGR arrays — pick the input you already have, no boilerplate.
  • Decoder + CLI for rendering a hash back to a placeholder image (thumb_hash_to_rgba, or thash photo.jpg -o preview.png).
  • Configurable target_size so you can trade hash quality for encoding speed.

Installation

# Pure-Python fallback only (no deps)
pip install thash

# Recommended runtime (NumPy fast path + Pillow decoding)
pip install thash[all]

If you use uv:

uv add thash --extra all

Requires Python ≥ 3.10.

Quick start

The high-level API takes pretty much any image-shaped thing:

from thash import encode

# From a file path or URL-fetched bytes
hash_bytes = encode("photo.jpg")
hash_bytes = encode(open("photo.jpg", "rb").read())

# From a PIL image (already in memory, no re-decode)
from PIL import Image
hash_bytes = encode(Image.open("photo.jpg"))

# From a NumPy array (H,W,3) or (H,W,4) — assumed RGB/RGBA
import numpy as np
arr = np.asarray(Image.open("photo.jpg"))
hash_bytes = encode(arr)

# From an OpenCV BGR array
import cv2
bgr = cv2.imread("photo.jpg")
hash_bytes = encode(bgr, color_order="BGR")

# Grayscale / float arrays in [0, 1] also work — they're normalized for you
hash_bytes = encode(arr.astype(np.float32) / 255.0)

Decoding the hash back

from thash import (
    thumb_hash_to_rgba,
    thumb_hash_to_average_rgba,
    thumb_hash_to_approximate_aspect_ratio,
)

# Render the hash to a small RGBA preview (flat bytes, length 4*w*h)
w, h, rgba = thumb_hash_to_rgba(hash_bytes, base_size=256)

from PIL import Image
Image.frombytes("RGBA", (w, h), rgba).save("preview.png")

# Want a numpy array instead?
import numpy as np
arr = np.frombuffer(rgba, dtype=np.uint8).reshape(h, w, 4)

# Cheaper queries that don't reconstruct pixels:
r, g, b, a = thumb_hash_to_average_rgba(hash_bytes)            # values in [0, 1]
aspect = thumb_hash_to_approximate_aspect_ratio(hash_bytes)    # w / h

base_size is the longer edge of the reconstructed image. ThumbHash only carries ~5×5 / 7×7 frequency coefficients, so the IDCT is run directly at the requested resolution rather than upsampled — values up to a few hundred pixels look smooth without any extra resampling. The aspect ratio comes from the encoded lx / ly (e.g. 7:4 for a landscape, 5:7 for a portrait); near-non-integer ratios like 1.6 get quantized to 1.75, this is a spec property, not an implementation choice.

Command-line

Installing the package exposes a thash command (equivalent to python -m thash):

# --- Encoding: print a hash for each input ---
thash photo.jpg                        # base64 hash, one per line
thash --format hex photo.jpg
thash --format bytes photo.jpg
thash photo.jpg cover.png hero.webp    # multi-file: "path<TAB>hash" per line
thash --target-size 64 photo.jpg       # trade quality for encoding speed

# --- Rendering: save a placeholder preview PNG ---
thash photo.jpg -o preview.png                     # encode + decode + save
thash photo.jpg -o preview.png --size 128          # cap the longer edge
thash "2dYJLJSBdoiAiHVoSHZzcBf4iA==" -o p.png      # base64 hash → PNG (no source image needed)
thash d9d6092c94817688808875684876737017f888 -o p.png  # hex hash → PNG
thash a.jpg b.jpg "2dYJ...==" -o out/              # multi input → directory, auto-named

The CLI uses the high-level encode() / thumb_hash_to_rgba() APIs. It needs Pillow for decoding images / writing PNG previews; NumPy is optional (only accelerates the encode / decode). Install with pip install thash[pillow] for the CLI or [all] for the fast path too. Hash inputs are auto-detected: hex strings (even length, hex alphabet) are tried first, then base64 (standard and URL-safe).

Tuning speed vs. quality

target_size controls the longer dimension of the image after thumbnail (spec max is 100). Smaller = faster, lower fidelity:

target_size DCT time Visual quality
100 (default) ~125 μs Reference / spec-compatible
64 ~85 μs Indistinguishable in practice
50 ~75 μs Fine for any placeholder use
32 ~65 μs Colors correct, details blurred
16 ~45 μs Average color + rough orientation only
encode("photo.jpg", target_size=50)         # 4× DCT speedup, hash is still spec-valid
encode("photo.jpg", target_size=50, resize=False)  # error if image is already > 50px

Note: For very large input images the bottleneck is usually PIL decode + resize, not the DCT. target_size only matters once your input is already small (e.g. a tensor in an ML pipeline). For batch processing many photos from disk, parallelize with concurrent.futures.ProcessPoolExecutor before reaching for GPU.

Backends

The package picks the NumPy backend at import time if available, otherwise falls back to a pure-Python reference implementation. You can force one explicitly:

encode(img, backend="numpy")    # default, BLAS-accelerated matmul
encode(img, backend="pure")     # reference Python, no deps

Backend availability is reflected by module flags:

from thash import has_numpy, has_pil

Backend comparison (random RGBA inputs, byte-identical output)

case                 size alpha         pure        numpy
---------------------------------------------------------
tiny-square       10x10   False     300 μs        41 μs
small-square      32x32   False     2.7 ms        66 μs
medium-square     64x64   False    11.4 ms        86 μs
max-square       100x100  False    26.8 ms       124 μs
landscape        100x56   False    11.7 ms        98 μs
max-square+a     100x100   True    28.2 ms       168 μs
HD-720p         1280x720  False        —          48 ms
FHD-1080p       1920x1080 False        —         208 ms
UHD-4K          3840x2160 False        —         516 ms

NumPy is ~100–140× faster than the reference impl on spec-sized inputs (geometric mean ~88×, median ~137×). Three optimizations stack here:

  1. Cosine basis cached by (n, k)np.cos cost amortizes across calls with shared dimensions (common after thumbnail).
  2. P and Q channels combined into a single batched 3×3 matmul.
  3. float32 DCT — Bandwidth halved, BLAS sgemm faster than dgemm; verified byte-identical on 490 random inputs across all spec shapes.

The pure-Python fallback is kept so the package works with zero deps. Run uv run python benchmarks/run.py to reproduce.

Low-level API

The original byte-list API still works for callers who want to manage RGBA themselves:

from thash import rgba_to_thumb_hash, image_to_thumb_hash

# Flat list: [R, G, B, A, R, G, B, A, ...], length = 4 * w * h
hash_bytes = rgba_to_thumb_hash(width, height, flat_rgba_ints)

# Open a file via Pillow, thumbnail to ≤100x100, encode
hash_bytes = image_to_thumb_hash("photo.jpg")

rgba_to_thumb_hash automatically picks the NumPy backend if available, falling back to pure Python otherwise.

Development

git clone https://github.com/Jannchie/thumbhash-py.git
cd thumbhash-py
uv sync --all-extras --all-groups   # full dev env (deps + dev tools + bench)

uv run pytest                       # tests
uv run ruff check thash benchmarks  # lint
uv run python benchmarks/run.py     # benchmark suite

Credits

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

thash-1.2.0.tar.gz (18.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

thash-1.2.0-py3-none-any.whl (20.0 kB view details)

Uploaded Python 3

File details

Details for the file thash-1.2.0.tar.gz.

File metadata

  • Download URL: thash-1.2.0.tar.gz
  • Upload date:
  • Size: 18.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for thash-1.2.0.tar.gz
Algorithm Hash digest
SHA256 9ef438bfe9f89f2e68ff4114edc171c41bacf756d93710754dcba2489f398894
MD5 35671524f4ad42157c2ee95afda7f08c
BLAKE2b-256 caf7153f70fa55217f70e5ca4074e763aa2e02ba70be5110015d467dc65af68b

See more details on using hashes here.

File details

Details for the file thash-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: thash-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 20.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for thash-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c0cf95171cf87b1d08c79d0dda61e6f07b19398bf330fc8d70b9ffbea447b98a
MD5 f987ad3d993e6837102a2516f3042670
BLAKE2b-256 9660d0455462d3a66b046789153bc3fca02ac75faa40d8ddf7c871dd94d78cc3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page