Skip to main content

Multi-dimensional image similarity comparison. Like a prism decomposes light, imageprism decomposes similarity into independent dimensions.

Project description

imageprism

Compare two images across several kinds of similarity in one call. Runs on CPU, no PyTorch, no GPU, no API keys.

"Similar" is ambiguous. Two images can be the same file re-saved, the same kind of scene, the same specific object, or the same person, and those are different questions with different answers. imageprism scores each one as its own dimension and hands back the numbers together, so you choose the dimensions your problem actually needs. Everything runs on CPU through ONNX Runtime, NumPy, and Pillow.

from imageprism import ImagePrism, Dimension

prism = ImagePrism(dimensions=[Dimension.HASH, Dimension.SEMANTIC])
result = prism.compare("a.jpg", "b.jpg")
result.scores  # {"hash": 0.12, "semantic": 0.82}

That is the whole public surface: one class, one method.

Doing this by hand usually means installing imagehash, a CLIP wrapper, and a face library, then reconciling three preprocessing pipelines and three output formats. imageprism puts them behind one API.

Install

pip install imageprism

ImagePrism() with no arguments uses hashing only, which needs no downloads and runs immediately. Adding a model-backed dimension downloads its model once, caches it locally, and works offline after that.

Dimensions

Dimension Answers Technique Model
hash Pixel-level duplicate? pHash + dHash + aHash none (pure algorithm)
semantic Same concept or category? CLIP cosine similarity CLIP ViT-B/32 quantized, ~89MB
instance Same specific object? DINOv2 cosine similarity DINOv2-small, ~87MB
style Similar visual style? MobileNetV2 feature similarity MobileNetV2, ~14MB
face Same person? Face detection + embedding UltraFace ~1.2MB + ArcFace ~137MB (swappable, see below)

Dimensions can be passed as enum members or plain strings: dimensions=["hash", "semantic"] works.

Reading the scores

Each score is a float, but the scales differ per dimension - 0.5 does not mean "50% similar". Rough calibration, from the benchmarks and spot checks below:

  • hash: fraction of matching hash bits. Above ~0.9 is a near-duplicate. Unrelated images land around 0.5, not 0.
  • semantic: CLIP cosine similarity, which lives in a compressed range. Unrelated images score around 0.5; above ~0.75 usually means the same concept.
  • instance: DINOv2 cosine similarity. The same object re-photographed scores high (0.7+); unrelated images fall near 0.
  • face: ArcFace cosine similarity. On LFW the optimal same-person threshold is about 0.32. The score is None when no face is detected in either image, which is different from 0.0 (faces found, but different people).
  • style: MobileNetV2 feature cosine. Treat as a rough signal; it is not benchmarked yet.

Thresholds always depend on your data, so validate on a sample before hard-coding one.

Profiles

A profile picks a set of dimensions and blends them into one weighted score, keeping the per-dimension breakdown alongside.

from imageprism import ImagePrism, Profile

prism = ImagePrism(profile=Profile.COPYRIGHT)
result = prism.compare("original.jpg", "suspect.jpg")
result.weighted_score  # 0.58
result.scores          # {"hash": 0.51, "instance": 0.34, "semantic": 0.82}

There are six: ecommerce, copyright, dedup, visual_search, identity, forgery. The last two use the face dimension, so read the licensing note below before relying on them.

Custom weights and per-dimension config

from imageprism import ImagePrism, Dimension, HashConfig

prism = ImagePrism(
    weights={Dimension.HASH: 0.6, Dimension.SEMANTIC: 0.4},
    config={Dimension.HASH: HashConfig(algorithms=("phash",), hash_size=16)},
)

Weights are normalized to sum to 1, so relative values are all that matter. A dimension that cannot score a pair (face with no face detected) contributes 0 to the weighted score.

Embeddings and caching

You can pull embeddings out to store in your own index. Repeated comparisons reuse them: the cache is keyed on pixel content, so comparing one image against many others embeds it only once.

emb = prism.embed("a.jpg")          # {"hash": np.array([...]), "semantic": np.array([...])}
prism.compare("a.jpg", "b.jpg")     # a.jpg is embedded here
prism.compare("a.jpg", "c.jpg")     # a.jpg comes from the cache

Batch dedup

dedup embeds each image once and groups near-duplicates, keeping one representative per group. A typical use is trimming a video down to its distinct frames before running something expensive on each one.

from imageprism import ImagePrism, Dimension

# frames pulled from a video, in order
frames = ["frame_0001.jpg", "frame_0002.jpg", "frame_0003.jpg"]

prism = ImagePrism(dimensions=[Dimension.HASH])
result = prism.dedup(frames, threshold=0.9)

result.unique                     # indices of the distinct frames
result.labels                     # for each frame, the representative it was grouped under
distinct = [frames[i] for i in result.unique]

Each image is embedded once, then compared against the representatives kept so far, so the model work stays linear in the number of images. There is no approximate index yet, so a large set of mostly-distinct images grows quadratically in the comparison step.

The right threshold depends on the dimension: around 0.9 on hashing catches re-encodes and small edits, while a lower value on semantic groups by content. Configure a profile or weights instead of a single dimension to dedup on a blended score.

Face and model licensing

Face works out of the box, with one caveat. It detects the largest face with UltraFace (MIT) and embeds it with ArcFace by default. Those default ArcFace weights have no clear commercial license, because like most high-accuracy face models they trace back to research-only datasets. The first time you run the face dimension, imageprism prints a warning.

For commercial use, bring your own embedding model:

from imageprism import ImagePrism, Dimension, FaceConfig

prism = ImagePrism(
    dimensions=[Dimension.FACE],
    config={Dimension.FACE: FaceConfig(embed_repo="your-org/your-model", embed_file="model.onnx")},
)

The model needs to accept a 112x112 RGB face crop. Common choices are FaceX (Apache-2.0), InsightFace buffalo_l (MIT code, but the weights need a commercial license), or one you train yourself. imageprism ships no face weights, so the choice of what you have rights to is yours.

Benchmarks

The numbers below reproduce with the scripts in benchmarks/.

Hashing, on 200 LFW images under 15 transforms (JPEG, resize, crop, rotation, blur, noise, flip, brightness, contrast):

Config AUC Accuracy
default (pHash + dHash + aHash, mean) 0.919 0.885
aHash only 0.937 0.889
dHash only 0.900 0.870
pHash only 0.875 0.863

JPEG, resize, blur, noise, brightness, and contrast all sit near 1.0 AUC. The weak points are a 50% center crop (about 0.40) and a horizontal flip (about 0.59).

Semantic, retrieval on the CIFAR-100 test set (1000 images, 100 classes):

Metric Score
Recall@1 0.44
Recall@5 0.70
Recall@10 0.80
Recall@20 0.88

CIFAR-100 images are 32px upscaled to 224 before they reach CLIP, so treat these as a floor rather than a ceiling.

Face, LFW verification over 6000 pairs: 0.963 AUC, 0.909 accuracy, 0.726 TAR at FAR=1%. Well-aligned ArcFace reaches roughly 0.998 accuracy; the gap comes from the plain crop-and-resize alignment described below.

Instance and style are not benchmarked yet.

Limitations

  • Dedup is greedy and brute-force. It embeds each image once, but the comparison step has no approximate index, so a large set of mostly-distinct images scales quadratically. There is no corpus-scale similarity search yet; a FAISS-backed index is the planned next step.
  • Hashing handles JPEG, resize, blur, noise, and brightness almost perfectly, but a 50% center crop drops it to about 0.40 AUC and a horizontal flip to about 0.59.
  • The style dimension uses MobileNetV2 features rather than gram matrices on intermediate layers, so it is a rough signal and is not benchmarked yet.
  • Profile weights are sensible defaults, not values tuned on data.
  • Face alignment is a plain crop and resize with no landmark step, which puts LFW accuracy near 91% against roughly 99.8% for well-aligned ArcFace. It works, but it is not state of the art.
  • A single ImagePrism instance is not thread-safe; the embedding cache is unsynchronized. Use one instance per thread.

When to use something else

If you need only one kind of similarity, reach for the specialized tool: imagehash for perceptual hashing, CLIP directly for semantic search, insightface for faces. imageprism is worth it when you need two or more of these behind one interface. It saves the integration work rather than trying to beat any of those libraries at their single job.

License

MIT, see LICENSE. Model weights download from their original sources under their own licenses.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

imageprism-0.1.0.tar.gz (12.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

imageprism-0.1.0-py3-none-any.whl (17.0 kB view details)

Uploaded Python 3

File details

Details for the file imageprism-0.1.0.tar.gz.

File metadata

  • Download URL: imageprism-0.1.0.tar.gz
  • Upload date:
  • Size: 12.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.0 {"installer":{"name":"uv","version":"0.11.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for imageprism-0.1.0.tar.gz
Algorithm Hash digest
SHA256 d6691ab6c5cab5f7ccacf0418561a392ae23328fffe9bc63e7f5d53d89468b26
MD5 5daeab1173570008cc12f23402d16000
BLAKE2b-256 f1f2c10b946157066dce06bbbadae9fa863cf8e24aa904e3461a5c40355510de

See more details on using hashes here.

File details

Details for the file imageprism-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: imageprism-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 17.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.0 {"installer":{"name":"uv","version":"0.11.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for imageprism-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 dcf4ad8cb19a431a5d0ac4c3dd636dde0fa2c60bea41c830d3365af4ebdafcae
MD5 eb5978d25ed8c6d7ec2340a1165a3ca9
BLAKE2b-256 09a4612b26dfcaa1df6435e3291ade520e9e2c38a36a6c88d60081415ba3053a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page