Open-source framework for evaluating AI-generated media quality

These details have not been verified by PyPI

Project links

Project description

evalmedia

Open-source framework for evaluating AI-generated media quality.

Think "DeepEval but for generative media." Structured, actionable quality assessments for AI-generated images — designed for AI agents, not dashboards.

Website | PyPI | GitHub

Install

pip install evalmedia

With judge backends:

pip install evalmedia[claude]    # Anthropic Claude
pip install evalmedia[openai]    # OpenAI GPT-4.1
pip install evalmedia[all]       # Everything

Quick Start

Single image evaluation

from evalmedia import ImageEval
from evalmedia.checks.image import FaceArtifacts, PromptAdherence, TextLegibility

result = ImageEval.run(
    image="output.png",
    prompt="a woman holding a coffee cup in a cafe",
    checks=[FaceArtifacts(), PromptAdherence(), TextLegibility()],
)

print(result.passed)        # False
print(result.summary())     # "FAIL — 2/3 checks passed (score: 0.65). Failed: face_artifacts."
print(result.to_dict())     # structured JSON for agents

Rubric-based evaluation

from evalmedia import ImageEval
from evalmedia.rubrics import Portrait

result = ImageEval.run(
    image="output.png",
    prompt="professional headshot of a young man",
    rubric=Portrait(),
)

Built-in rubrics: GeneralQuality, Portrait, MarketingAsset.

Async support

result = await ImageEval.arun(
    image=image_bytes,
    prompt=prompt,
    checks=[FaceArtifacts(), PromptAdherence()],
)

Compare multiple images

from evalmedia import compare
from evalmedia.rubrics import GeneralQuality

results = await compare(
    images=["modelA.png", "modelB.png", "modelC.png"],
    prompt="a sunset over mountains",
    rubric=GeneralQuality(),
)

best_label, best_result = results.best()

Checks

Check	Type	What it evaluates
`PromptAdherence`	VLM	Does the image match what was asked for?
`FaceArtifacts`	VLM	Distorted faces, wrong eye count, melted features
`HandArtifacts`	VLM	Extra/missing fingers, distorted hands
`TextLegibility`	VLM	Is text in the image spelled correctly and readable?
`AestheticQuality`	VLM	Composition, lighting, color harmony
`StyleConsistency`	VLM	Does it match a style reference image?
`CLIPSimilarity`	Classical	CLIP cosine similarity between prompt and image
`ResolutionAdequacy`	Classical	Is the resolution sufficient?

Configuration

import evalmedia

# Set global default judge
evalmedia.set_judge("claude", api_key="sk-...")

# Or via environment variables
# EVALMEDIA_DEFAULT_JUDGE=claude
# EVALMEDIA_ANTHROPIC_API_KEY=sk-...
# EVALMEDIA_OPENAI_API_KEY=sk-...

CLI

# Evaluate an image
evalmedia check output.png --prompt "a woman in a cafe" --checks face_artifacts,prompt_adherence

# Use a rubric
evalmedia check output.png --prompt "headshot" --rubric portrait --format json

# Compare images
evalmedia compare outputs/ --prompt "sunset" --rubric general_quality

# List available checks and rubrics
evalmedia list-checks
evalmedia list-rubrics

Agent Integration

Use evalmedia as a tool in AI agent workflows:

from evalmedia.integrations import openai_tool_schema, anthropic_tool_schema

# OpenAI function calling
tools = [openai_tool_schema()]

# Anthropic tool_use
tools = [anthropic_tool_schema()]

Custom Rubrics

from evalmedia.rubrics import Rubric, WeightedCheck
from evalmedia.checks.image import PromptAdherence, TextLegibility, AestheticQuality

rubric = Rubric(
    name="my_rubric",
    checks=[
        WeightedCheck(check=PromptAdherence(), weight=0.4),
        WeightedCheck(check=TextLegibility(), weight=0.3),
        WeightedCheck(check=AestheticQuality(), weight=0.3),
    ],
    pass_threshold=0.75,
)

Or via YAML:

name: my_rubric
pass_threshold: 0.75
checks:
  - check: prompt_adherence
    weight: 0.4
  - check: text_legibility
    weight: 0.3
  - check: aesthetic_quality
    weight: 0.3

License

Apache 2.0

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.4.0

Mar 25, 2026

0.3.0

Mar 18, 2026

0.2.0

Mar 18, 2026

0.1.1

Mar 15, 2026

This version

0.1.0

Mar 15, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

evalmedia-0.1.0.tar.gz (152.4 kB view details)

Uploaded Mar 15, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

evalmedia-0.1.0-py3-none-any.whl (40.8 kB view details)

Uploaded Mar 15, 2026 Python 3

File details

Details for the file evalmedia-0.1.0.tar.gz.

File metadata

Download URL: evalmedia-0.1.0.tar.gz
Upload date: Mar 15, 2026
Size: 152.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for evalmedia-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`c9721d3d7f7c9fc863ce181e173f0d974c8b778be3d2c95225b2a84610962a5e`
MD5	`fdceda413e6a579edd349f1813bde234`
BLAKE2b-256	`6e0c2a99fbdef0a2b37358cfa14d2c1c0f5c866df9dbfc9f59c5071e57a9344b`

See more details on using hashes here.

File details

Details for the file evalmedia-0.1.0-py3-none-any.whl.

File metadata

Download URL: evalmedia-0.1.0-py3-none-any.whl
Upload date: Mar 15, 2026
Size: 40.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for evalmedia-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5938f96d31d65bcc61f1cfebff3985a22c65eea70f3aa7be978953953552eeeb`
MD5	`f66b4b1ba2a8ed7c2df259668c67917b`
BLAKE2b-256	`1eb53f547894d385ba446a986a22671330e33fa74313ba111ef32dd2d8e3ecc5`

See more details on using hashes here.

evalmedia 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

evalmedia

Install

Quick Start

Single image evaluation

Rubric-based evaluation

Async support

Compare multiple images

Checks

Configuration

CLI

Agent Integration

Custom Rubrics

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes