Lightweight aspect-based evaluation framework with YAML check definitions.
Project description
Eval Banana
Aspect-based evaluation framework - deterministic checks + harness judges. Score anything (agentic outputs, workflows, banana!) with simple YAML check definitions.
What it does
Eval Banana discovers YAML check definitions from eval_checks/ directories, runs them, and produces a report. Every check scores 0 or 1 with equal weight.
Two check types:
| Type | Purpose | How it works |
|---|---|---|
deterministic |
Objective assertions (file existence, content, structure) | Runs a Python script via subprocess; exit 0 = pass |
harness_judge |
LLM-as-a-judge (coherence, accuracy, tone) | Invokes the configured AI agent to score target files; expects {"score": 0|1} |
The harness judge uses one of the following: codex, gemini, claude, openhands, opencode, pi
Writing checks
Create a directory called eval_checks/ anywhere in your project. Add YAML files -- one per check.
Deterministic check
schema_version: 1
id: output_file_exists
type: deterministic
description: Verify that output.json was generated.
target_paths:
- output.json
script: |
import json, sys
from pathlib import Path
ctx = json.loads(Path(sys.argv[1]).read_text())
target = ctx["targets"][0]
assert target["exists"], f"{target['path']} not found"
Harness judge check
schema_version: 1
id: summary_is_accurate
type: harness_judge
description: The generated summary accurately reflects source data.
target_paths:
- summary.txt
- source_data.json
instructions: |
Compare the summary against the source data.
Score 1 if accurate, 0 if it contains fabricated claims.
Requires a configured harness agent. Set [harness] agent in config or pass --harness-agent.
Quick start
# Install
uv sync
# Initialize project config and example check
eb init
# Run all discovered checks
eb run
# List discovered checks without running
eb list
# Validate YAML definitions without running
eb validate
Installation
# Using uv (recommended)
uv add eval-banana
# Using pip
pip install eval-banana
# From source (development)
git clone https://github.com/writeitai/eval-banana.git
cd eval-banana
uv sync --extra dev
After installation the CLI is available as eb.
Harness configuration
harness_judge checks require a configured harness agent. Configure it via TOML or CLI flags.
TOML
# .eval-banana/config.toml
[harness]
agent = "codex"
model = "gpt-5.4"
# reasoning_effort = "high"
Custom agent templates
Add [agents.<name>] sections to override built-in templates or define new ones:
[agents.myagent]
command = ["my-cli", "run"]
shared_flags = ["--headless"]
prompt_flag = "--prompt"
model_flag = "--model"
Skills
Install bundled skills into a target project's native agent directories:
eb install
eb install --target-agents codex
eb install --skills gemini_media_use --dry-run
| Agent | Destination |
|---|---|
claude |
.claude/skills/ |
codex |
.codex/skills/ |
openhands |
.agents/skills/ |
opencode |
.agents/skills/ |
gemini |
.gemini/skills/ |
See docs/configuration.md for details on bundled skills, authentication, and the deprecated distribute-skills command.
Configuration
Eval Banana uses a single project-level TOML config at .eval-banana/config.toml.
Create it with eb init.
Config precedence (highest to lowest)
- CLI arguments (
--output-dir,--harness-model, etc.) - Environment variables (
EVAL_BANANA_*) - Project config (
.eval-banana/config.toml) - Built-in defaults
Key settings
| Setting | Default | Env var |
|---|---|---|
output_dir |
.eval-banana/results |
EVAL_BANANA_OUTPUT_DIR |
pass_threshold |
1.0 |
EVAL_BANANA_PASS_THRESHOLD |
llm_max_input_chars |
0 |
EVAL_BANANA_LLM_MAX_INPUT_CHARS |
harness.agent |
unset | EVAL_BANANA_HARNESS_AGENT |
harness.model |
unset | EVAL_BANANA_HARNESS_MODEL |
CLI reference
eb init [--force] Create config + example check
eb run [OPTIONS] Run all discovered checks
eb list [OPTIONS] List discovered checks
eb validate [OPTIONS] Validate YAML without running
eb install [OPTIONS] Install bundled skills into agent dirs
Options for run/list/validate:
--check-dir PATH Scan only this directory
--check-id TEXT Run only this check ID
--output-dir TEXT Override output directory
--pass-threshold FLOAT Minimum pass ratio (0.0-1.0)
--verbose Enable debug logging
--cwd TEXT Working directory
Harness options (run only):
--harness-agent TEXT Agent CLI used by harness_judge checks
--harness-model TEXT Model override for the agent
--harness-reasoning-effort TEXT Reasoning effort level
Output
Each run creates a timestamped directory under the configured output_dir:
.eval-banana/results/<run_id>/
report.json # Machine-readable full report
report.md # Human-readable Markdown report
checks/
<check_id>.json # Per-check result
<check_id>.stdout.txt # Captured stdout (if any)
<check_id>.stderr.txt # Captured stderr (if any)
Development
uv sync --extra dev
make test # Run tests
make fix # Auto-fix lint + format
make pyright # Type check
make all-check # Lint + format + types + tests (matches CI)
Inspiration
Eval Banana's binary 0/1 scoring philosophy draws directly on two earlier bodies of work:
- Hamel Husain's Creating LLM-as-a-Judge that drives business results — argues that binary pass/fail judgments produce more reliable, actionable evals than Likert-style 1-5 scales.
- RAGAS's Aspect Critic metric — evaluates outputs against a natural-language aspect definition and returns a binary verdict.
The harness_judge check type is essentially an Aspect Critic: you describe what "good" looks like in plain language, and the judge returns {"score": 0|1}.
Contributing
Issues and pull requests are welcome. Please run make all-check before opening a PR.
Changelog
See CHANGELOG.md for release notes.
License
Apache License 2.0 — see LICENSE for details.
Copyright 2026 WriteIt.ai s.r.o.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file eval_banana-0.0.6.tar.gz.
File metadata
- Download URL: eval_banana-0.0.6.tar.gz
- Upload date:
- Size: 2.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f4932180c25faf3e8f7e047e564dab5b298005f61849303dffc34d77889cd2bc
|
|
| MD5 |
2c218caedf5714a8831b66b2e77b3f24
|
|
| BLAKE2b-256 |
e4820442fd625b218447867c2965d6bf5813c667606e11029570a04ce5afa4f1
|
Provenance
The following attestation bundles were made for eval_banana-0.0.6.tar.gz:
Publisher:
release.yml on writeitai/eval-banana
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
eval_banana-0.0.6.tar.gz -
Subject digest:
f4932180c25faf3e8f7e047e564dab5b298005f61849303dffc34d77889cd2bc - Sigstore transparency entry: 1342543452
- Sigstore integration time:
-
Permalink:
writeitai/eval-banana@100615fd3ad605a253e1b3dd6fb4c5b618d21153 -
Branch / Tag:
refs/tags/v0.0.6 - Owner: https://github.com/writeitai
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@100615fd3ad605a253e1b3dd6fb4c5b618d21153 -
Trigger Event:
push
-
Statement type:
File details
Details for the file eval_banana-0.0.6-py3-none-any.whl.
File metadata
- Download URL: eval_banana-0.0.6-py3-none-any.whl
- Upload date:
- Size: 56.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a2114b5f57823760cdf6be477af45279c746d6f49e24fa86ccab88981b05ce91
|
|
| MD5 |
cb213bcfe1c19fbd23f2eabe0ad7ef26
|
|
| BLAKE2b-256 |
4cfa742ad37bf693ecdfc3c0dd4e1f5b14bbff91f79b92588bf54c3d6c332544
|
Provenance
The following attestation bundles were made for eval_banana-0.0.6-py3-none-any.whl:
Publisher:
release.yml on writeitai/eval-banana
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
eval_banana-0.0.6-py3-none-any.whl -
Subject digest:
a2114b5f57823760cdf6be477af45279c746d6f49e24fa86ccab88981b05ce91 - Sigstore transparency entry: 1342543455
- Sigstore integration time:
-
Permalink:
writeitai/eval-banana@100615fd3ad605a253e1b3dd6fb4c5b618d21153 -
Branch / Tag:
refs/tags/v0.0.6 - Owner: https://github.com/writeitai
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@100615fd3ad605a253e1b3dd6fb4c5b618d21153 -
Trigger Event:
push
-
Statement type: