Skip to main content

Open-source framework for evaluating AI-generated media quality

Project description

evalmedia

Open-source framework for evaluating AI-generated media quality.

Think "DeepEval but for generative media." Structured, actionable quality assessments for AI-generated images — designed for AI agents, not dashboards.

Website | PyPI | GitHub

Install

pip install evalmedia

With judge backends:

pip install evalmedia[claude]    # Anthropic Claude
pip install evalmedia[openai]    # OpenAI GPT-4.1
pip install evalmedia[all]       # Everything

Quick Start

Single image evaluation

from evalmedia import ImageEval
from evalmedia.checks.image import FaceArtifacts, PromptAdherence, TextLegibility

result = ImageEval.run(
    image="output.png",
    prompt="a woman holding a coffee cup in a cafe",
    checks=[FaceArtifacts(), PromptAdherence(), TextLegibility()],
)

print(result.passed)        # False
print(result.summary())     # "FAIL — 2/3 checks passed (score: 0.65). Failed: face_artifacts."
print(result.to_dict())     # structured JSON for agents

Rubric-based evaluation

from evalmedia import ImageEval
from evalmedia.rubrics import Portrait

result = ImageEval.run(
    image="output.png",
    prompt="professional headshot of a young man",
    rubric=Portrait(),
)

Built-in rubrics: GeneralQuality, Portrait, MarketingAsset.

Async support

result = await ImageEval.arun(
    image=image_bytes,
    prompt=prompt,
    checks=[FaceArtifacts(), PromptAdherence()],
)

Compare multiple images

from evalmedia import compare
from evalmedia.rubrics import GeneralQuality

results = await compare(
    images=["modelA.png", "modelB.png", "modelC.png"],
    prompt="a sunset over mountains",
    rubric=GeneralQuality(),
)

best_label, best_result = results.best()

Checks

Check Type What it evaluates
PromptAdherence VLM Does the image match what was asked for?
FaceArtifacts VLM Distorted faces, wrong eye count, melted features
HandArtifacts VLM Extra/missing fingers, distorted hands
TextLegibility VLM Is text in the image spelled correctly and readable?
AestheticQuality VLM Composition, lighting, color harmony
StyleConsistency VLM Does it match a style reference image?
CLIPSimilarity Classical CLIP cosine similarity between prompt and image
ResolutionAdequacy Classical Is the resolution sufficient?

Configuration

import evalmedia

# Set global default judge
evalmedia.set_judge("claude", api_key="sk-...")

# Or via environment variables
# EVALMEDIA_DEFAULT_JUDGE=claude
# EVALMEDIA_ANTHROPIC_API_KEY=sk-...
# EVALMEDIA_OPENAI_API_KEY=sk-...

CLI

# Evaluate an image
evalmedia check output.png --prompt "a woman in a cafe" --checks face_artifacts,prompt_adherence

# Use a rubric
evalmedia check output.png --prompt "headshot" --rubric portrait --format json

# Compare images
evalmedia compare outputs/ --prompt "sunset" --rubric general_quality

# List available checks and rubrics
evalmedia list-checks
evalmedia list-rubrics

Agent Integration

Use evalmedia as a tool in AI agent workflows:

from evalmedia.integrations import openai_tool_schema, anthropic_tool_schema

# OpenAI function calling
tools = [openai_tool_schema()]

# Anthropic tool_use
tools = [anthropic_tool_schema()]

Custom Rubrics

from evalmedia.rubrics import Rubric, WeightedCheck
from evalmedia.checks.image import PromptAdherence, TextLegibility, AestheticQuality

rubric = Rubric(
    name="my_rubric",
    checks=[
        WeightedCheck(check=PromptAdherence(), weight=0.4),
        WeightedCheck(check=TextLegibility(), weight=0.3),
        WeightedCheck(check=AestheticQuality(), weight=0.3),
    ],
    pass_threshold=0.75,
)

Or via YAML:

name: my_rubric
pass_threshold: 0.75
checks:
  - check: prompt_adherence
    weight: 0.4
  - check: text_legibility
    weight: 0.3
  - check: aesthetic_quality
    weight: 0.3

License

Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

evalmedia-0.2.0.tar.gz (162.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

evalmedia-0.2.0-py3-none-any.whl (46.0 kB view details)

Uploaded Python 3

File details

Details for the file evalmedia-0.2.0.tar.gz.

File metadata

  • Download URL: evalmedia-0.2.0.tar.gz
  • Upload date:
  • Size: 162.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for evalmedia-0.2.0.tar.gz
Algorithm Hash digest
SHA256 bdb77cc25d2710e5f14db7c1457601c0eb3f1cc84038e81069d2718a9d5bc4dc
MD5 a68eabeec266d25d301fb62665715151
BLAKE2b-256 92b96372517a47b63880559abd987271dbc56382b2f4df8e8220b4c23cc84734

See more details on using hashes here.

Provenance

The following attestation bundles were made for evalmedia-0.2.0.tar.gz:

Publisher: release.yml on saidkaban/evalmedia

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file evalmedia-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: evalmedia-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 46.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for evalmedia-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 784d0e852a90bc2101c2f47c09048928043ca7a567092c159d11ab6eb38ac180
MD5 e1b366b682eddf2cf7401d7dd4e06cfa
BLAKE2b-256 c90403935c1f23032d172f8769ab62ef78c3565285de2f378361d845f1de8aab

See more details on using hashes here.

Provenance

The following attestation bundles were made for evalmedia-0.2.0-py3-none-any.whl:

Publisher: release.yml on saidkaban/evalmedia

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page