Evaluation-Driven Development toolkit for OpenClaw agents
Project description
openclaw-edd
Evaluation-Driven Development toolkit for OpenClaw agents. Zero-friction quality gates — log files as the single source of truth.
Features
- ✅ Zero Configuration -
pip install openclaw-edd && openclaw-edd watchand go - ✅ Zero Intrusion - No need to modify OpenClaw config or restart Gateway
- ✅ Zero Dependencies - Core features work without any external libraries (PyYAML optional)
- ✅ Complete Loop - watch → run → suggest → apply → diff → mine → export
Quick Start
pip install openclaw-edd
# Step 0: See what tools your agent actually uses
openclaw-edd watch
# Step 1: Run built-in evaluation (6 test cases)
openclaw-edd run --quickstart --agent main --summary-line
# Step 2: Full EDD loop
openclaw-edd run --quickstart --agent main --output-json round1.json
openclaw-edd edd suggest --report round1.json
# ... fix your agent ...
openclaw-edd run --quickstart --agent main --output-json round2.json
openclaw-edd edd diff --before round1.json --after round2.json
📖 Complete User Guide → — 7-step walkthrough from install to CI integration.
中文版本:用户路径指南 →。
Commands
Core Commands
- watch - Real-time log monitoring, print tool event stream
- trace - Replay historical event chain
- state - View/modify session state
- artifacts - Manage tool output files
- sessions - List/view historical sessions
- run - Run evaluation test cases
- gen-cases - Generate test case templates
EDD Loop Commands
openclaw-edd edd suggest- Generate improvement suggestions from failed casesopenclaw-edd edd apply- Apply suggestions to workspaceopenclaw-edd edd diff- Compare changes between two runsopenclaw-edd edd mine- Mine golden cases from historical logsopenclaw-edd edd judge- LLM-based scoring for tool selection and output qualityopenclaw-edd edd export- Export golden dataset (JSONL/CSV)
Test Case Format
cases:
- id: mysql_slow_query
message: "Any slow queries in MySQL recently"
eval_type: regression # "regression" (prevent regression) | "capability" (capability climb), default regression
expect_tools:
- exec
expect_commands:
- "check_health"
- "prod-01"
expect_commands_ordered:
- "check_health"
- "query_metrics"
forbidden_commands:
- "rm -rf"
expect_tools_ordered:
- exec
expect_output_contains:
- "slow query"
forbidden_tools:
- exec
expect_tool_args: # Tool argument assertions (White-box evaluation)
exec:
command: "check_health" # Substring match for string values
agent: openclaw_agent
timeout_s: 30
tags: [mysql, sre]
description: "MySQL slow query troubleshooting basic verification"
Notes:
expect_commands,expect_commands_ordered, andforbidden_commandsdo case-insensitive substring matching onexectoolinput.command.expect_output_containsis case-insensitive substring matching.- For
expect_tool_args, string values use case-insensitive substring matching; non-strings use exact match.
Eval Type Explanation
- regression: Prevent regression evaluation, starts near 100%, any drop is an alert signal
- capability: Capability climb evaluation, starts with low pass rate, tests what agent can't do yet
Run reports are grouped by eval_type:
📊 Regression Eval (Prevent Regression)
Passed: 8/10 (80%) ← Below 100% needs attention
FAIL: mysql_slow_query, mysql_alert_check
📈 Capability Eval (Capability Climb)
Passed: 3/8 (37.5%) ← Normal, this is a climb metric
PASS: mysql_basic_query ...
Golden Dataset Format
{
"id": "50a359b5_1",
"description": "Extracted from session 50a359b5, 2026-02-28",
"source": "mined",
"tags": ["mined"],
"conversation": [
{
"turn": 1,
"user": "Any slow queries in MySQL recently",
"golden_tool_sequence": [
{
"name": "query_metrics",
"args": {"metric": "p99_latency", "time_range": "1h"},
"output_summary": "P99 latency 120ms, exceeds threshold"
}
],
"golden_output": "Detected MySQL slow query, P99 latency 120ms",
"assert": [
{"type": "tool_called", "value": "query_metrics"},
{"type": "tool_args", "tool": "query_metrics", "args": {"metric": "p99_latency", "time_range": "1h"}},
{"type": "contains", "value": "slow query"}
]
}
],
"metadata": {
"session_id": "50a359b5-184f-4c73-913d-3b53ebbdf109",
"agent": "openclaw_agent",
"extracted_at": "2026-02-28T16:00:00",
"skill_triggered": "skills/mysql_sre.md"
}
}
Data Sources
- Log Location:
/tmp/openclaw/openclaw-YYYY-MM-DD.log - Format: JSON Lines
- State:
~/.openclaw_eval/state/<session_id>.json - Artifacts:
~/.openclaw_eval/artifacts/<session_id>/
Workspace Path Resolution
Priority:
--workspaceparameter~/.openclaw/openclaw.json→agents.defaults.workspace- Fallback:
~/.openclaw/workspace
Dependencies
- Zero mandatory dependencies - Core features work without any external libraries
- Optional dependencies:
- PyYAML (only needed when using
--cases) - anthropic (only needed when using
edd judge)
- PyYAML (only needed when using
pip install openclaw-edd[yaml] # Install with YAML support
pip install openclaw-edd # Includes anthropic SDK
Platform Support
- Linux/macOS - Full support (including daemon mode)
- Windows - All features supported except daemon mode
CI Integration
# Run evaluation
openclaw-edd run --cases cases.yaml --output-json report.json
# Check exit code
if [ $? -ne 0 ]; then
echo "Evaluation failed"
exit 1
fi
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file openclaw_edd-0.3.3.tar.gz.
File metadata
- Download URL: openclaw_edd-0.3.3.tar.gz
- Upload date:
- Size: 51.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2feffdfca14f4e2992e738651032edf574a76f29b35163b369912740f9934624
|
|
| MD5 |
f40acee86c062c49638ada67de53e46a
|
|
| BLAKE2b-256 |
8df9814d88545113acedfa74f7735210870589bb6445c052f946204453cb2cac
|
Provenance
The following attestation bundles were made for openclaw_edd-0.3.3.tar.gz:
Publisher:
release.yml on Belyenochi/openclaw-edd
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
openclaw_edd-0.3.3.tar.gz -
Subject digest:
2feffdfca14f4e2992e738651032edf574a76f29b35163b369912740f9934624 - Sigstore transparency entry: 1016811572
- Sigstore integration time:
-
Permalink:
Belyenochi/openclaw-edd@fa507b01fc586428592ceb6f23b8dde66c42efb5 -
Branch / Tag:
refs/tags/v0.3.3 - Owner: https://github.com/Belyenochi
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@fa507b01fc586428592ceb6f23b8dde66c42efb5 -
Trigger Event:
push
-
Statement type:
File details
Details for the file openclaw_edd-0.3.3-py3-none-any.whl.
File metadata
- Download URL: openclaw_edd-0.3.3-py3-none-any.whl
- Upload date:
- Size: 48.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
be3371e189870beeb553a19d0f0fef2316f1ba69a93c9049860126718069b078
|
|
| MD5 |
05aed356d9b1fdd6923bc662d6533473
|
|
| BLAKE2b-256 |
30630c9455a28f3ded809bab8b467ffca7b2409c30acdfc8f0e848f56e594a03
|
Provenance
The following attestation bundles were made for openclaw_edd-0.3.3-py3-none-any.whl:
Publisher:
release.yml on Belyenochi/openclaw-edd
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
openclaw_edd-0.3.3-py3-none-any.whl -
Subject digest:
be3371e189870beeb553a19d0f0fef2316f1ba69a93c9049860126718069b078 - Sigstore transparency entry: 1016811606
- Sigstore integration time:
-
Permalink:
Belyenochi/openclaw-edd@fa507b01fc586428592ceb6f23b8dde66c42efb5 -
Branch / Tag:
refs/tags/v0.3.3 - Owner: https://github.com/Belyenochi
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@fa507b01fc586428592ceb6f23b8dde66c42efb5 -
Trigger Event:
push
-
Statement type: