Multi-dimensional image similarity comparison. Like a prism decomposes light, imageprism decomposes similarity into independent dimensions.
Project description
imageprism
Compare two images across several kinds of similarity in one call. Runs on CPU, no PyTorch, no GPU, no API keys.
"Similar" is ambiguous. Two images can be the same file re-saved, the same kind of scene, the same specific object, or the same person, and those are different questions with different answers. imageprism scores each one as its own dimension and hands back the numbers together, so you choose the dimensions your problem actually needs. Everything runs on CPU through ONNX Runtime, NumPy, and Pillow.
from imageprism import ImagePrism, Dimension
prism = ImagePrism(dimensions=[Dimension.HASH, Dimension.SEMANTIC])
result = prism.compare("a.jpg", "b.jpg")
result.scores # {"hash": 0.12, "semantic": 0.82}
That is the whole public surface: one class, one method.
Doing this by hand usually means installing imagehash, a CLIP wrapper, and a face library, then reconciling three preprocessing pipelines and three output formats. imageprism puts them behind one API.
Install
pip install imageprism
ImagePrism() with no arguments uses hashing only, which needs no downloads and runs immediately. Adding a model-backed dimension downloads its model once, caches it locally, and works offline after that.
Dimensions
| Dimension | Answers | Technique | Model |
|---|---|---|---|
hash |
Pixel-level duplicate? | pHash + dHash + aHash | none (pure algorithm) |
semantic |
Same concept or category? | CLIP cosine similarity | CLIP ViT-B/32 quantized, ~89MB |
instance |
Same specific object? | DINOv2 cosine similarity | DINOv2-small, ~87MB |
style |
Similar visual style? | MobileNetV2 feature similarity | MobileNetV2, ~14MB |
face |
Same person? | Face detection + embedding | UltraFace ~1.2MB + ArcFace ~137MB (swappable, see below) |
Dimensions can be passed as enum members or plain strings: dimensions=["hash", "semantic"] works.
Reading the scores
Each score is a float, but the scales differ per dimension - 0.5 does not mean "50% similar". Rough calibration, from the benchmarks and spot checks below:
hash: fraction of matching hash bits. Above ~0.9 is a near-duplicate. Unrelated images land around 0.5, not 0.semantic: CLIP cosine similarity, which lives in a compressed range. Unrelated images score around 0.5; above ~0.75 usually means the same concept.instance: DINOv2 cosine similarity. The same object re-photographed scores high (0.7+); unrelated images fall near 0.face: ArcFace cosine similarity. On LFW the optimal same-person threshold is about 0.32. The score isNonewhen no face is detected in either image, which is different from 0.0 (faces found, but different people).style: MobileNetV2 feature cosine. Treat as a rough signal; it is not benchmarked yet.
Thresholds always depend on your data, so validate on a sample before hard-coding one.
Profiles
A profile picks a set of dimensions and blends them into one weighted score, keeping the per-dimension breakdown alongside.
from imageprism import ImagePrism, Profile
prism = ImagePrism(profile=Profile.COPYRIGHT)
result = prism.compare("original.jpg", "suspect.jpg")
result.weighted_score # 0.58
result.scores # {"hash": 0.51, "instance": 0.34, "semantic": 0.82}
There are six: ecommerce, copyright, dedup, visual_search, identity, forgery. The last two use the face dimension, so read the licensing note below before relying on them.
Custom weights and per-dimension config
from imageprism import ImagePrism, Dimension, HashConfig
prism = ImagePrism(
weights={Dimension.HASH: 0.6, Dimension.SEMANTIC: 0.4},
config={Dimension.HASH: HashConfig(algorithms=("phash",), hash_size=16)},
)
Weights are normalized to sum to 1, so relative values are all that matter. A dimension that cannot score a pair (face with no face detected) contributes 0 to the weighted score.
Embeddings and caching
You can pull embeddings out to store in your own index. Repeated comparisons reuse them: the cache is keyed on pixel content, so comparing one image against many others embeds it only once.
emb = prism.embed("a.jpg") # {"hash": np.array([...]), "semantic": np.array([...])}
prism.compare("a.jpg", "b.jpg") # a.jpg is embedded here
prism.compare("a.jpg", "c.jpg") # a.jpg comes from the cache
Batch dedup
dedup embeds each image once and groups near-duplicates, keeping one representative per group. A typical use is trimming a video down to its distinct frames before running something expensive on each one.
from imageprism import ImagePrism, Dimension
# frames pulled from a video, in order
frames = ["frame_0001.jpg", "frame_0002.jpg", "frame_0003.jpg"]
prism = ImagePrism(dimensions=[Dimension.HASH])
result = prism.dedup(frames, threshold=0.9)
result.unique # indices of the distinct frames
result.labels # for each frame, the representative it was grouped under
distinct = [frames[i] for i in result.unique]
Each image is embedded once, then compared against the representatives kept so far, so the model work stays linear in the number of images. There is no approximate index yet, so a large set of mostly-distinct images grows quadratically in the comparison step.
The right threshold depends on the dimension: around 0.9 on hashing catches re-encodes and small edits, while a lower value on semantic groups by content. Configure a profile or weights instead of a single dimension to dedup on a blended score.
Face and model licensing
Face works out of the box, with one caveat. It detects the largest face with UltraFace (MIT) and embeds it with ArcFace by default. Those default ArcFace weights have no clear commercial license, because like most high-accuracy face models they trace back to research-only datasets. The first time you run the face dimension, imageprism prints a warning.
For commercial use, bring your own embedding model:
from imageprism import ImagePrism, Dimension, FaceConfig
prism = ImagePrism(
dimensions=[Dimension.FACE],
config={Dimension.FACE: FaceConfig(embed_repo="your-org/your-model", embed_file="model.onnx")},
)
The model needs to accept a 112x112 RGB face crop. Common choices are FaceX (Apache-2.0), InsightFace buffalo_l (MIT code, but the weights need a commercial license), or one you train yourself. imageprism ships no face weights, so the choice of what you have rights to is yours.
Benchmarks
The numbers below reproduce with the scripts in benchmarks/.
Hashing, on 200 LFW images under 15 transforms (JPEG, resize, crop, rotation, blur, noise, flip, brightness, contrast):
| Config | AUC | Accuracy |
|---|---|---|
| default (pHash + dHash + aHash, mean) | 0.919 | 0.885 |
| aHash only | 0.937 | 0.889 |
| dHash only | 0.900 | 0.870 |
| pHash only | 0.875 | 0.863 |
JPEG, resize, blur, noise, brightness, and contrast all sit near 1.0 AUC. The weak points are a 50% center crop (about 0.40) and a horizontal flip (about 0.59).
Semantic, retrieval on the CIFAR-100 test set (1000 images, 100 classes):
| Metric | Score |
|---|---|
| Recall@1 | 0.44 |
| Recall@5 | 0.70 |
| Recall@10 | 0.80 |
| Recall@20 | 0.88 |
CIFAR-100 images are 32px upscaled to 224 before they reach CLIP, so treat these as a floor rather than a ceiling.
Face, LFW verification over 6000 pairs: 0.963 AUC, 0.909 accuracy, 0.726 TAR at FAR=1%. Well-aligned ArcFace reaches roughly 0.998 accuracy; the gap comes from the plain crop-and-resize alignment described below.
Instance and style are not benchmarked yet.
Limitations
- Dedup is greedy and brute-force. It embeds each image once, but the comparison step has no approximate index, so a large set of mostly-distinct images scales quadratically. There is no corpus-scale similarity search yet; a FAISS-backed index is the planned next step.
- Hashing handles JPEG, resize, blur, noise, and brightness almost perfectly, but a 50% center crop drops it to about 0.40 AUC and a horizontal flip to about 0.59.
- The style dimension uses MobileNetV2 features rather than gram matrices on intermediate layers, so it is a rough signal and is not benchmarked yet.
- Profile weights are sensible defaults, not values tuned on data.
- Face alignment is a plain crop and resize with no landmark step, which puts LFW accuracy near 91% against roughly 99.8% for well-aligned ArcFace. It works, but it is not state of the art.
- A single
ImagePrisminstance is not thread-safe; the embedding cache is unsynchronized. Use one instance per thread.
When to use something else
If you need only one kind of similarity, reach for the specialized tool: imagehash for perceptual hashing, CLIP directly for semantic search, insightface for faces. imageprism is worth it when you need two or more of these behind one interface. It saves the integration work rather than trying to beat any of those libraries at their single job.
License
MIT, see LICENSE. Model weights download from their original sources under their own licenses.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file imageprism-0.1.0.tar.gz.
File metadata
- Download URL: imageprism-0.1.0.tar.gz
- Upload date:
- Size: 12.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.0 {"installer":{"name":"uv","version":"0.11.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d6691ab6c5cab5f7ccacf0418561a392ae23328fffe9bc63e7f5d53d89468b26
|
|
| MD5 |
5daeab1173570008cc12f23402d16000
|
|
| BLAKE2b-256 |
f1f2c10b946157066dce06bbbadae9fa863cf8e24aa904e3461a5c40355510de
|
File details
Details for the file imageprism-0.1.0-py3-none-any.whl.
File metadata
- Download URL: imageprism-0.1.0-py3-none-any.whl
- Upload date:
- Size: 17.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.0 {"installer":{"name":"uv","version":"0.11.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dcf4ad8cb19a431a5d0ac4c3dd636dde0fa2c60bea41c830d3365af4ebdafcae
|
|
| MD5 |
eb5978d25ed8c6d7ec2340a1165a3ca9
|
|
| BLAKE2b-256 |
09a4612b26dfcaa1df6435e3291ade520e9e2c38a36a6c88d60081415ba3053a
|