Visual Generation Quality Evaluation SDK
Project description
Evalytic
Evals for visual AI. Automated quality evaluation for AI-generated images and video.
Know if your AI-generated visuals are good — before your users tell you they're not.
pip install evalytic
evaly bench \
-m flux-schnell -m flux-dev -m flux-pro \
-p "A photorealistic cat on a windowsill" \
-o report.html --yes
What It Does
Evalytic benchmarks AI image generation models by generating images, scoring them with VLM judges (Gemini, GPT, Claude, Ollama), and producing rich reports — all in one command.
- Model Selection — Compare Flux Schnell vs Dev vs Pro with real prompts
- Prompt Optimization — Measure how well models follow your prompts
- Regression Detection — Catch quality drops when models update
- CI/CD Quality Gate — Block deploys when image quality falls below threshold
- 7 Semantic Dimensions — visual_quality, prompt_adherence, text_rendering, input_fidelity, transformation_quality, artifact_detection, identity_preservation
- Consensus Judging — Multi-judge scoring with automatic agreement analysis
Quickstart
1. Install
pip install evalytic
2. Set API Keys
export FAL_KEY=your_fal_key # fal.ai for image generation
export GEMINI_API_KEY=your_gemini_key # Default judge
3. Run
# Single model benchmark
evaly bench -m flux-schnell -p "A cat sitting on a windowsill" --yes
# Compare models with HTML report
evaly bench -m flux-schnell -m flux-dev -m flux-pro \
-p prompts.json -o report.html --review
# img2img benchmark
evaly bench -m flux-kontext -m seedream-edit -m reve-edit \
-p prompts.json --input product.jpg --yes
# Score an existing image
evaly eval --image output.png --prompt "A sunset over mountains"
# CI/CD quality gate
evaly gate --report report.json --threshold 3.5
CLI Commands
| Command | Description |
|---|---|
evaly bench |
Generate, score, and report in one command |
evaly eval |
Score a single image without generation |
evaly gate |
CI/CD quality gate with pass/fail exit codes |
Judges
Any VLM that can analyze images works as a judge:
evaly bench -m flux-schnell -p "A cat" -j gemini-2.5-flash # Default
evaly bench -m flux-schnell -p "A cat" -j gemini-2.5-pro # Gemini Pro
evaly bench -m flux-schnell -p "A cat" -j openai/gpt-5.2 # OpenAI
evaly bench -m flux-schnell -p "A cat" -j anthropic/claude-sonnet-4-6 # Anthropic
evaly bench -m flux-schnell -p "A cat" -j ollama/qwen2.5-vl:7b # Local
Consensus Mode
Use multiple judges for more reliable scores:
evaly bench -m flux-schnell -p "A cat" \
--judges "gemini-2.5-flash,openai/gpt-5.2"
Two judges score in parallel. If they disagree, a third breaks the tie.
Optional Extras
pip install "evalytic[metrics]" # CLIP Score + LPIPS + ArcFace (~2GB)
pip install "evalytic[all]" # Everything
Configuration
Create evalytic.toml in your project root:
[keys]
fal = "your_fal_key"
gemini = "your_gemini_key"
[bench]
judge = "gemini-2.5-flash"
dimensions = ["visual_quality", "prompt_adherence"]
concurrency = 4
Documentation
Full docs at docs.evalytic.ai
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file evalytic-0.3.3.tar.gz.
File metadata
- Download URL: evalytic-0.3.3.tar.gz
- Upload date:
- Size: 1.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
73243a515de9552bbd540be88707a583184c02665990a14e7137f7ba9e549819
|
|
| MD5 |
611087705ac8b2159be71e4395b76b1a
|
|
| BLAKE2b-256 |
933344693d22cbf58d30137ab3f5107bcf994e9f9454ca394abba9a4a0b56920
|
File details
Details for the file evalytic-0.3.3-py3-none-any.whl.
File metadata
- Download URL: evalytic-0.3.3-py3-none-any.whl
- Upload date:
- Size: 94.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
39a288e996bd5a0ac9489c3d8b388756f4744e3d6c6ddcb6fe3f79d437367a96
|
|
| MD5 |
b2ad1f0b4214e95e5193018070fbe82e
|
|
| BLAKE2b-256 |
0ebbe9a929cbae5e1b84f31bc38101bcd44d24298082d68508acbdbae10df7a4
|