Evaluate and compare AI agent setups through experiments, inspections, and rubric scoring.
Project description
setup-eval
Evaluate AI agent setups for best practices, redundancy, security, and cross-component issues.
Works with Claude Code and Cursor. Auto-detects which tool(s) a project uses.
Install
pip install setup-eval
For YARA malware signature scanning, add: pip install setup-eval[yara]
What it does
Most tools test whether a skill produces correct output. This tool checks the setup itself: CLAUDE.md, skills, commands, hooks, MCP configs, agents, .cursor/rules/*.mdc, .cursorrules.
Four commands, same engine:
| Command | What it does | Needs LLM? |
|---|---|---|
setup-eval-lint |
43 deterministic rules + system analysis (token budget, trigger overlaps, dependencies). Fast, CI-suitable. | No |
setup-eval-review |
Per-component rubric review, 21 cross-type checks. KEEP/REVIEW/REMOVE verdicts. | Yes |
setup-eval-security |
All security rules + YARA + CVE lookups + 4-check semantic review. SAFE/CAUTION/UNSAFE. | Optional |
eval-skill |
Deep-evaluate one skill individually and in context of the full setup. | Optional |
How to use it
CLI
setup-eval setup-eval-lint .
setup-eval setup-eval-lint . --preset strict --format json --fail-on-error
setup-eval setup-eval-review . --provider gemini
setup-eval setup-eval-security . --review
setup-eval eval-skill ./skills/my-skill --context . --rubric
Requires GEMINI_API_KEY or ANTHROPIC_API_KEY for review/security/skill commands.
Claude Code (plugin)
/plugin marketplace add redhat-community-ai-tools/harness-eval-lab
/plugin install setup-eval@setup-eval
/reload-plugins
After installing, use from the / menu: /setup-eval:setup-eval-lint, /setup-eval:setup-eval-review, /setup-eval:setup-eval-security, /setup-eval:eval-skill. No API key needed; Claude evaluates in-session.
Updating: Re-run the install command to get the latest rules.
Cursor
pip install setup-eval
Copy .cursor/commands/ from this repo into your project. The 4 commands appear in Cursor's command palette: /setup-eval-lint, /setup-eval-review, /setup-eval-security, /eval-skill. No API key needed; Cursor evaluates in-session.
Inspection Rules (43)
| Category | Rules | What they check |
|---|---|---|
| Structural | 1 | SKILL.md exists |
| Frontmatter | 3 | Description required/quality, format valid |
| Content | 4 | Duplicate detection (TF-IDF), broken references, circular references, token budget |
| Security | 9 | Credential access, prompt injection (17 patterns), data exfiltration, obfuscation, reverse shells, AST analysis, taint tracking, MCP least-privilege, tool poisoning |
| Security (opt-in) | 2 | YARA signatures, CVE lookups via OSV.dev |
| Commands | 8 | Description, script exists, duplicates, credentials, injection, skill overlap, shadows built-in, references nonexistent skill |
| CLAUDE.md | 3 | Exists, skill duplication, generic advice detection |
| Hooks | 1 | Structure validation, dangerous patterns, network access |
| Agents | 9 | Description, skills exist, tool format, constraint matching, credentials, injection, exfiltration, obfuscation, reverse shells |
Four presets: recommended (default), strict, security, pre-workflow.
Contributing
See CONTRIBUTING.md for adding rules and submitting PRs.
Changelog
See CHANGELOG.md for release history.
Future Plans
See future-plans/ for planned improvements (SARIF output, security benchmarks, runner abstraction, dynamic workflows, impact measurement).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file setup_eval-3.1.2.tar.gz.
File metadata
- Download URL: setup_eval-3.1.2.tar.gz
- Upload date:
- Size: 127.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d7d0a54e4bc565e84b6b12ca57d7ad0b402b2448425465298cced0e87430f4c9
|
|
| MD5 |
42f743c9f7456e0d1842ea2613807437
|
|
| BLAKE2b-256 |
4bb4ec08641517aaf712e82cb810799cd870bd9e102ccbdcffe3770cef9675f5
|
Provenance
The following attestation bundles were made for setup_eval-3.1.2.tar.gz:
Publisher:
publish.yml on redhat-community-ai-tools/harness-eval-lab
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
setup_eval-3.1.2.tar.gz -
Subject digest:
d7d0a54e4bc565e84b6b12ca57d7ad0b402b2448425465298cced0e87430f4c9 - Sigstore transparency entry: 1847600116
- Sigstore integration time:
-
Permalink:
redhat-community-ai-tools/harness-eval-lab@43acf99b126f08b1e440a38345d68d1b90499936 -
Branch / Tag:
refs/tags/v3.1.2 - Owner: https://github.com/redhat-community-ai-tools
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@43acf99b126f08b1e440a38345d68d1b90499936 -
Trigger Event:
push
-
Statement type:
File details
Details for the file setup_eval-3.1.2-py3-none-any.whl.
File metadata
- Download URL: setup_eval-3.1.2-py3-none-any.whl
- Upload date:
- Size: 104.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3d8cadd0a505fd4ed283d138f930e5128560a8ff7d87ae83f5f3215b56853967
|
|
| MD5 |
ed24596f0965c28577a8e104f2e7acc6
|
|
| BLAKE2b-256 |
9037cb275081dc6d2941fb7106ec595e480e0cd2d165fd175dc860be7600393b
|
Provenance
The following attestation bundles were made for setup_eval-3.1.2-py3-none-any.whl:
Publisher:
publish.yml on redhat-community-ai-tools/harness-eval-lab
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
setup_eval-3.1.2-py3-none-any.whl -
Subject digest:
3d8cadd0a505fd4ed283d138f930e5128560a8ff7d87ae83f5f3215b56853967 - Sigstore transparency entry: 1847600198
- Sigstore integration time:
-
Permalink:
redhat-community-ai-tools/harness-eval-lab@43acf99b126f08b1e440a38345d68d1b90499936 -
Branch / Tag:
refs/tags/v3.1.2 - Owner: https://github.com/redhat-community-ai-tools
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@43acf99b126f08b1e440a38345d68d1b90499936 -
Trigger Event:
push
-
Statement type: