Skip to main content

Open-source framework for evaluating AI-generated media quality

Project description

evalmedia

Open-source framework for evaluating AI-generated media quality.

Think "DeepEval but for generative media." Structured, actionable quality assessments for AI-generated images — designed for AI agents, not dashboards.

Website | PyPI | GitHub

Install

pip install evalmedia

With judge backends:

pip install evalmedia[claude]    # Anthropic Claude
pip install evalmedia[openai]    # OpenAI GPT-4.1
pip install evalmedia[all]       # Everything

Quick Start

Single image evaluation

from evalmedia import ImageEval
from evalmedia.checks.image import FaceArtifacts, PromptAdherence, TextLegibility

result = ImageEval.run(
    image="output.png",
    prompt="a woman holding a coffee cup in a cafe",
    checks=[FaceArtifacts(), PromptAdherence(), TextLegibility()],
)

print(result.passed)        # False
print(result.summary())     # "FAIL — 2/3 checks passed (score: 0.65). Failed: face_artifacts."
print(result.to_dict())     # structured JSON for agents

Rubric-based evaluation

from evalmedia import ImageEval
from evalmedia.rubrics import Portrait

result = ImageEval.run(
    image="output.png",
    prompt="professional headshot of a young man",
    rubric=Portrait(),
)

Built-in rubrics: GeneralQuality, Portrait, MarketingAsset.

Async support

result = await ImageEval.arun(
    image=image_bytes,
    prompt=prompt,
    checks=[FaceArtifacts(), PromptAdherence()],
)

Compare multiple images

from evalmedia import compare
from evalmedia.rubrics import GeneralQuality

results = await compare(
    images=["modelA.png", "modelB.png", "modelC.png"],
    prompt="a sunset over mountains",
    rubric=GeneralQuality(),
)

best_label, best_result = results.best()

Checks

Check Type What it evaluates
PromptAdherence VLM Does the image match what was asked for?
FaceArtifacts VLM Distorted faces, wrong eye count, melted features
HandArtifacts VLM Extra/missing fingers, distorted hands
TextLegibility VLM Is text in the image spelled correctly and readable?
AestheticQuality VLM Composition, lighting, color harmony
StyleConsistency VLM Does it match a style reference image?
CLIPSimilarity Classical CLIP cosine similarity between prompt and image
ResolutionAdequacy Classical Is the resolution sufficient?

Configuration

import evalmedia

# Set global default judge
evalmedia.set_judge("claude", api_key="sk-...")

# Or via environment variables
# EVALMEDIA_DEFAULT_JUDGE=claude
# EVALMEDIA_ANTHROPIC_API_KEY=sk-...
# EVALMEDIA_OPENAI_API_KEY=sk-...

CLI

# Evaluate an image
evalmedia check output.png --prompt "a woman in a cafe" --checks face_artifacts,prompt_adherence

# Use a rubric
evalmedia check output.png --prompt "headshot" --rubric portrait --format json

# Compare images
evalmedia compare outputs/ --prompt "sunset" --rubric general_quality

# List available checks and rubrics
evalmedia list-checks
evalmedia list-rubrics

Agent Integration

Use evalmedia as a tool in AI agent workflows:

from evalmedia.integrations import openai_tool_schema, anthropic_tool_schema

# OpenAI function calling
tools = [openai_tool_schema()]

# Anthropic tool_use
tools = [anthropic_tool_schema()]

Custom Rubrics

from evalmedia.rubrics import Rubric, WeightedCheck
from evalmedia.checks.image import PromptAdherence, TextLegibility, AestheticQuality

rubric = Rubric(
    name="my_rubric",
    checks=[
        WeightedCheck(check=PromptAdherence(), weight=0.4),
        WeightedCheck(check=TextLegibility(), weight=0.3),
        WeightedCheck(check=AestheticQuality(), weight=0.3),
    ],
    pass_threshold=0.75,
)

Or via YAML:

name: my_rubric
pass_threshold: 0.75
checks:
  - check: prompt_adherence
    weight: 0.4
  - check: text_legibility
    weight: 0.3
  - check: aesthetic_quality
    weight: 0.3

License

Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

evalmedia-0.4.0.tar.gz (163.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

evalmedia-0.4.0-py3-none-any.whl (48.0 kB view details)

Uploaded Python 3

File details

Details for the file evalmedia-0.4.0.tar.gz.

File metadata

  • Download URL: evalmedia-0.4.0.tar.gz
  • Upload date:
  • Size: 163.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for evalmedia-0.4.0.tar.gz
Algorithm Hash digest
SHA256 d024e054155cca515a95b2f0fb13855733f581b359ce1ef23bc49592c69fff80
MD5 6088714ed786181023fe93b9ba0eb7a4
BLAKE2b-256 70d904b2ec45093c5bdb451ebec3311cf8240a2e30611c3dfd1c057fa2f5e31b

See more details on using hashes here.

Provenance

The following attestation bundles were made for evalmedia-0.4.0.tar.gz:

Publisher: release.yml on saidkaban/evalmedia

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file evalmedia-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: evalmedia-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 48.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for evalmedia-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 60da0c2fa3c22c901503ac940b9e38b61f592dd2682910836d53bdf8086ec8a6
MD5 9fe332f4e96ecfcc52d2c40980c12303
BLAKE2b-256 8f05f56a1e35213fb957ecac6a0e8613feefa0c98f1c385c509a12ab52a6efbe

See more details on using hashes here.

Provenance

The following attestation bundles were made for evalmedia-0.4.0-py3-none-any.whl:

Publisher: release.yml on saidkaban/evalmedia

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page