Skip to main content

Open-source framework for evaluating AI-generated media quality

Project description

evalmedia

Open-source framework for evaluating AI-generated media quality.

Think "DeepEval but for generative media." Structured, actionable quality assessments for AI-generated images — designed for AI agents, not dashboards.

Website | PyPI | GitHub

Install

pip install evalmedia

With judge backends:

pip install evalmedia[claude]    # Anthropic Claude
pip install evalmedia[openai]    # OpenAI GPT-4.1
pip install evalmedia[all]       # Everything

Quick Start

Single image evaluation

from evalmedia import ImageEval
from evalmedia.checks.image import FaceArtifacts, PromptAdherence, TextLegibility

result = ImageEval.run(
    image="output.png",
    prompt="a woman holding a coffee cup in a cafe",
    checks=[FaceArtifacts(), PromptAdherence(), TextLegibility()],
)

print(result.passed)        # False
print(result.summary())     # "FAIL — 2/3 checks passed (score: 0.65). Failed: face_artifacts."
print(result.to_dict())     # structured JSON for agents

Rubric-based evaluation

from evalmedia import ImageEval
from evalmedia.rubrics import Portrait

result = ImageEval.run(
    image="output.png",
    prompt="professional headshot of a young man",
    rubric=Portrait(),
)

Built-in rubrics: GeneralQuality, Portrait, MarketingAsset.

Async support

result = await ImageEval.arun(
    image=image_bytes,
    prompt=prompt,
    checks=[FaceArtifacts(), PromptAdherence()],
)

Compare multiple images

from evalmedia import compare
from evalmedia.rubrics import GeneralQuality

results = await compare(
    images=["modelA.png", "modelB.png", "modelC.png"],
    prompt="a sunset over mountains",
    rubric=GeneralQuality(),
)

best_label, best_result = results.best()

Checks

Check Type What it evaluates
PromptAdherence VLM Does the image match what was asked for?
FaceArtifacts VLM Distorted faces, wrong eye count, melted features
HandArtifacts VLM Extra/missing fingers, distorted hands
TextLegibility VLM Is text in the image spelled correctly and readable?
AestheticQuality VLM Composition, lighting, color harmony
StyleConsistency VLM Does it match a style reference image?
CLIPSimilarity Classical CLIP cosine similarity between prompt and image
ResolutionAdequacy Classical Is the resolution sufficient?

Configuration

import evalmedia

# Set global default judge
evalmedia.set_judge("claude", api_key="sk-...")

# Or via environment variables
# EVALMEDIA_DEFAULT_JUDGE=claude
# EVALMEDIA_ANTHROPIC_API_KEY=sk-...
# EVALMEDIA_OPENAI_API_KEY=sk-...

CLI

# Evaluate an image
evalmedia check output.png --prompt "a woman in a cafe" --checks face_artifacts,prompt_adherence

# Use a rubric
evalmedia check output.png --prompt "headshot" --rubric portrait --format json

# Compare images
evalmedia compare outputs/ --prompt "sunset" --rubric general_quality

# List available checks and rubrics
evalmedia list-checks
evalmedia list-rubrics

Agent Integration

Use evalmedia as a tool in AI agent workflows:

from evalmedia.integrations import openai_tool_schema, anthropic_tool_schema

# OpenAI function calling
tools = [openai_tool_schema()]

# Anthropic tool_use
tools = [anthropic_tool_schema()]

Custom Rubrics

from evalmedia.rubrics import Rubric, WeightedCheck
from evalmedia.checks.image import PromptAdherence, TextLegibility, AestheticQuality

rubric = Rubric(
    name="my_rubric",
    checks=[
        WeightedCheck(check=PromptAdherence(), weight=0.4),
        WeightedCheck(check=TextLegibility(), weight=0.3),
        WeightedCheck(check=AestheticQuality(), weight=0.3),
    ],
    pass_threshold=0.75,
)

Or via YAML:

name: my_rubric
pass_threshold: 0.75
checks:
  - check: prompt_adherence
    weight: 0.4
  - check: text_legibility
    weight: 0.3
  - check: aesthetic_quality
    weight: 0.3

License

Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

evalmedia-0.3.0.tar.gz (163.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

evalmedia-0.3.0-py3-none-any.whl (46.5 kB view details)

Uploaded Python 3

File details

Details for the file evalmedia-0.3.0.tar.gz.

File metadata

  • Download URL: evalmedia-0.3.0.tar.gz
  • Upload date:
  • Size: 163.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for evalmedia-0.3.0.tar.gz
Algorithm Hash digest
SHA256 5361e7f5bbaed4e6a427c16048279b883fe79ffc0c1b6a7eff78785871f81885
MD5 8abfa6665fe591878ebe2bfd059ecab3
BLAKE2b-256 bc8be87bd70e5d88b045828631d91b099e1c0749cb58a674566c3266d418c2b3

See more details on using hashes here.

Provenance

The following attestation bundles were made for evalmedia-0.3.0.tar.gz:

Publisher: release.yml on saidkaban/evalmedia

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file evalmedia-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: evalmedia-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 46.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for evalmedia-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 59ecf6eec8ececf4fc46ebf1a5e1f569c09bf526d3aae0edb744020fd5fc72a7
MD5 4cfc412acb0301d133072cbd218bf101
BLAKE2b-256 5b6750da7c8707b8c4cd2655f11639489722c3995c56ac973a052506e12b2a61

See more details on using hashes here.

Provenance

The following attestation bundles were made for evalmedia-0.3.0-py3-none-any.whl:

Publisher: release.yml on saidkaban/evalmedia

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page