Open-source framework for evaluating AI-generated media quality
Project description
evalmedia
Open-source framework for evaluating AI-generated media quality.
Think "DeepEval but for generative media." Structured, actionable quality assessments for AI-generated images — designed for AI agents, not dashboards.
Install
pip install evalmedia
With judge backends:
pip install evalmedia[claude] # Anthropic Claude
pip install evalmedia[openai] # OpenAI GPT-4.1
pip install evalmedia[all] # Everything
Quick Start
Single image evaluation
from evalmedia import ImageEval
from evalmedia.checks.image import FaceArtifacts, PromptAdherence, TextLegibility
result = ImageEval.run(
image="output.png",
prompt="a woman holding a coffee cup in a cafe",
checks=[FaceArtifacts(), PromptAdherence(), TextLegibility()],
)
print(result.passed) # False
print(result.summary()) # "FAIL — 2/3 checks passed (score: 0.65). Failed: face_artifacts."
print(result.to_dict()) # structured JSON for agents
Rubric-based evaluation
from evalmedia import ImageEval
from evalmedia.rubrics import Portrait
result = ImageEval.run(
image="output.png",
prompt="professional headshot of a young man",
rubric=Portrait(),
)
Built-in rubrics: GeneralQuality, Portrait, MarketingAsset.
Async support
result = await ImageEval.arun(
image=image_bytes,
prompt=prompt,
checks=[FaceArtifacts(), PromptAdherence()],
)
Compare multiple images
from evalmedia import compare
from evalmedia.rubrics import GeneralQuality
results = await compare(
images=["modelA.png", "modelB.png", "modelC.png"],
prompt="a sunset over mountains",
rubric=GeneralQuality(),
)
best_label, best_result = results.best()
Checks
| Check | Type | What it evaluates |
|---|---|---|
PromptAdherence |
VLM | Does the image match what was asked for? |
FaceArtifacts |
VLM | Distorted faces, wrong eye count, melted features |
HandArtifacts |
VLM | Extra/missing fingers, distorted hands |
TextLegibility |
VLM | Is text in the image spelled correctly and readable? |
AestheticQuality |
VLM | Composition, lighting, color harmony |
StyleConsistency |
VLM | Does it match a style reference image? |
CLIPSimilarity |
Classical | CLIP cosine similarity between prompt and image |
ResolutionAdequacy |
Classical | Is the resolution sufficient? |
Configuration
import evalmedia
# Set global default judge
evalmedia.set_judge("claude", api_key="sk-...")
# Or via environment variables
# EVALMEDIA_DEFAULT_JUDGE=claude
# EVALMEDIA_ANTHROPIC_API_KEY=sk-...
# EVALMEDIA_OPENAI_API_KEY=sk-...
CLI
# Evaluate an image
evalmedia check output.png --prompt "a woman in a cafe" --checks face_artifacts,prompt_adherence
# Use a rubric
evalmedia check output.png --prompt "headshot" --rubric portrait --format json
# Compare images
evalmedia compare outputs/ --prompt "sunset" --rubric general_quality
# List available checks and rubrics
evalmedia list-checks
evalmedia list-rubrics
Agent Integration
Use evalmedia as a tool in AI agent workflows:
from evalmedia.integrations import openai_tool_schema, anthropic_tool_schema
# OpenAI function calling
tools = [openai_tool_schema()]
# Anthropic tool_use
tools = [anthropic_tool_schema()]
Custom Rubrics
from evalmedia.rubrics import Rubric, WeightedCheck
from evalmedia.checks.image import PromptAdherence, TextLegibility, AestheticQuality
rubric = Rubric(
name="my_rubric",
checks=[
WeightedCheck(check=PromptAdherence(), weight=0.4),
WeightedCheck(check=TextLegibility(), weight=0.3),
WeightedCheck(check=AestheticQuality(), weight=0.3),
],
pass_threshold=0.75,
)
Or via YAML:
name: my_rubric
pass_threshold: 0.75
checks:
- check: prompt_adherence
weight: 0.4
- check: text_legibility
weight: 0.3
- check: aesthetic_quality
weight: 0.3
License
Apache 2.0
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file evalmedia-0.4.0.tar.gz.
File metadata
- Download URL: evalmedia-0.4.0.tar.gz
- Upload date:
- Size: 163.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d024e054155cca515a95b2f0fb13855733f581b359ce1ef23bc49592c69fff80
|
|
| MD5 |
6088714ed786181023fe93b9ba0eb7a4
|
|
| BLAKE2b-256 |
70d904b2ec45093c5bdb451ebec3311cf8240a2e30611c3dfd1c057fa2f5e31b
|
Provenance
The following attestation bundles were made for evalmedia-0.4.0.tar.gz:
Publisher:
release.yml on saidkaban/evalmedia
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
evalmedia-0.4.0.tar.gz -
Subject digest:
d024e054155cca515a95b2f0fb13855733f581b359ce1ef23bc49592c69fff80 - Sigstore transparency entry: 1181350725
- Sigstore integration time:
-
Permalink:
saidkaban/evalmedia@812d7e1d5388ac851156670da4d8986a30f007f4 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/saidkaban
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@812d7e1d5388ac851156670da4d8986a30f007f4 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file evalmedia-0.4.0-py3-none-any.whl.
File metadata
- Download URL: evalmedia-0.4.0-py3-none-any.whl
- Upload date:
- Size: 48.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
60da0c2fa3c22c901503ac940b9e38b61f592dd2682910836d53bdf8086ec8a6
|
|
| MD5 |
9fe332f4e96ecfcc52d2c40980c12303
|
|
| BLAKE2b-256 |
8f05f56a1e35213fb957ecac6a0e8613feefa0c98f1c385c509a12ab52a6efbe
|
Provenance
The following attestation bundles were made for evalmedia-0.4.0-py3-none-any.whl:
Publisher:
release.yml on saidkaban/evalmedia
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
evalmedia-0.4.0-py3-none-any.whl -
Subject digest:
60da0c2fa3c22c901503ac940b9e38b61f592dd2682910836d53bdf8086ec8a6 - Sigstore transparency entry: 1181350728
- Sigstore integration time:
-
Permalink:
saidkaban/evalmedia@812d7e1d5388ac851156670da4d8986a30f007f4 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/saidkaban
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@812d7e1d5388ac851156670da4d8986a30f007f4 -
Trigger Event:
workflow_dispatch
-
Statement type: