Evaluate and compare AI agent setups through experiments, inspections, and rubric scoring.

These details have not been verified by PyPI

Project description

setup-eval

Evaluate AI agent setups for best practices, redundancy, security, and cross-component issues.

What it does

Most agent evaluation tools test whether a skill completes a task correctly. This tool evaluates the entire setup that surrounds the agent: CLAUDE.md, skills, commands, hooks, MCP configs, and sub-agents.

It checks whether each component follows best practices, whether components work well together, and whether anything is redundant, conflicting, or insecure.

Supported tools: Claude Code and Cursor. The tool auto-detects which tool(s) a project uses and evaluates all discovered components.

Overview

 ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐
 │ setup-eval-lint │  │ setup-eval-     │  │ setup-eval-     │  │ setup-eval-     │
 │                 │  │ review          │  │ security        │  │ skill           │
 │ 43 rules        │  │ per-component   │  │ all security    │  │ deep-dive on    │
 │ system analysis │  │ rubrics         │  │ rules           │  │ one skill       │
 │ token budget    │  │ 21 cross-type   │  │ AST + taint     │  │ lint + rubric   │
 │ trigger overlap │  │ checks          │  │ YARA + CVE      │  │ + contextual    │
 │ dependencies    │  │ instruction     │  │ 4-check         │  │ analysis        │
 │ context util    │  │ clarity         │  │ semantic review  │  │                 │
 │                 │  │ KEEP / REVIEW   │  │ SAFE / CAUTION  │  │ KEEP / REVIEW   │
 │ no LLM, fast   │  │ / REMOVE        │  │ / UNSAFE        │  │ / REMOVE        │
 └─────────────────┘  └─────────────────┘  └─────────────────┘  └─────────────────┘
   "does it pass?"      "is it effective?"     "is it safe?"       "how is this skill?"

Install

From PyPI

pip install setup-eval

From source

git clone https://github.com/redhat-community-ai-tools/harness-eval-lab.git
cd setup-eval
uv sync

Optional extras:

uv sync --extra llm       # LLM support (for review CLI and eval-skill --rubric)
uv sync --extra security  # YARA signature scanning (for security)

As a Claude Code plugin

Install directly from within Claude Code:

/plugin marketplace add redhat-community-ai-tools/harness-eval-lab
/plugin install setup-eval@setup-eval
/reload-plugins

Updating: Re-run the install command periodically to get the latest rules and improvements. Follow the repository for release announcements.

For Cursor users

Install the CLI:

pip install setup-eval
setup-eval setup-eval-lint /path/to/your/project

To use the commands inside Cursor, copy the .cursor/commands/ directory from this repo into your project's .cursor/commands/. The 4 eval commands will appear in Cursor's command palette:

setup-eval-lint - fast static analysis (no LLM)
setup-eval-review - full LLM review
setup-eval-security - deep security audit
setup-eval-skill - deep-evaluate one skill

Or test locally during development:

claude --plugin-dir /path/to/setup-eval

After installing, these commands become available in / autocomplete:

/setup-eval:setup-eval-lint - fast static analysis, no LLM, CI-suitable
/setup-eval:setup-eval-review - full qualitative review with KEEP/REVIEW/REMOVE verdicts
/setup-eval:setup-eval-security - deep security audit with deterministic scan + semantic review
/setup-eval:eval-skill <skill-name> - deep-evaluate one skill in context

Usage

CLI

setup-eval setup-eval-lint /path/to/project
setup-eval setup-eval-lint /path/to/project --preset strict --format json
setup-eval setup-eval-lint /path/to/project --fail-on-error

export GEMINI_API_KEY=your-key  # or ANTHROPIC_API_KEY
setup-eval setup-eval-review /path/to/project
setup-eval setup-eval-review /path/to/project --provider anthropic --model claude-sonnet-4-20250514

setup-eval setup-eval-security /path/to/project
setup-eval setup-eval-security /path/to/project --review --provider gemini

setup-eval eval-skill /path/to/skills/my-skill --context /path/to/project
setup-eval eval-skill /path/to/skills/my-skill --context /path/to/project --rubric

Note on /setup-eval-security: The YARA signature scanning check requires yara-python. If not installed, the YARA check is skipped automatically and the report notes it. All other security checks run without extra dependencies. To enable YARA scanning:

pip install yara-python

CLI Commands

Command	Description	Needs LLM?
`setup-eval-lint`	39 deterministic rules + system analysis (budget, triggers, deps, context utilization).	No
`setup-eval-review`	Per-component rubric review, 21 cross-type checks, KEEP/REVIEW/REMOVE verdicts.	Yes (API key)
`setup-eval-security`	All security rules + YARA + CVE lookups + optional LLM semantic review.	Optional (`--review`)
`setup-eval-skill`	Deep-evaluate a single skill individually and in context of the setup.	Optional (`--rubric`)

Plugin Skills

Skill	Description	Needs LLM?
`/setup-eval-lint`	43 rules, system analysis. Fast, CI-suitable.	No
`/setup-eval-review`	Per-component rubrics, 21 cross-type checks, KEEP/REVIEW/REMOVE verdicts.	Yes (Claude in-session)
`/setup-eval-security`	Deterministic security scan + semantic security review with 4-check checklist.	Yes (Claude in-session)
`/setup-eval-skill`	Deep-evaluate one skill against rubric + contextual analysis.	Yes (Claude in-session)

Inspection Rules (43)

Category	Rules	What they check
Structural	1	SKILL.md exists
Frontmatter	3	Description required/quality (POV, use-case, length), format valid
Content	4	Duplicate detection (TF-IDF), broken references, circular references, token budget
Security	9	Credential access, prompt injection (17 patterns), data exfiltration, obfuscation, reverse shells, AST behavioral analysis, taint tracking, MCP least-privilege, MCP tool poisoning
Security (opt-in)	2	YARA signature scanning, CVE lookups via OSV.dev (only in `setup-eval-security`)
Commands	8	Description, script exists, duplicates, credentials, injection, skill overlap, shadows built-in, references nonexistent skill
CLAUDE.md	3	Exists, skill duplication, generic advice detection
Hooks	1	Structure validation, dangerous patterns
Agents	9	Description, skills exist, tool format, constraint matching, credentials, injection, exfiltration, obfuscation, reverse shells

Four presets: recommended (default), strict, security, pre-workflow.

Future Plans

The future-plans/ directory contains planned improvements, each in its own subfolder. Each doc explores a problem, presents approaches with trade-offs, and describes how to build it.

Every plan doc has a Status at the top:

Status	Meaning
`future`	Idea documented, not yet planned for implementation
`in design`	Actively being designed, approaches being evaluated
`in progress`	Implementation underway
`built`	Implemented and merged

Plan	What it addresses
adjusting-to-dynamic-workflows	Adapting to Claude Code's dynamic workflows (pre-flight checks, workflow evaluation, quality gates)
test-coverage	Expanding tests to cover all rules with edge cases
runner-abstraction	Evaluating setups for other agent tools (Cursor, Copilot, Windsurf)
impact-dimension	Measuring whether a setup actually helps the agent (A/B testing)
scoring-calibration	Validating review accuracy against human judgment
sarif-output	SARIF output format for GitHub code scanning (inline PR annotations, Security tab alerts)
security-benchmarks	Benchmarking security rules against known-malicious and benign setups (TPR/FPR measurement)
setup-recommend	Recommending missing components based on project stack profiling

Contributing

See how-to-contribute.md for guidelines on adding rules, future plans, and submitting PRs.

Changelog

See CHANGELOG.md for release history and notable changes.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

3.4.0

Jun 18, 2026

3.1.2

Jun 17, 2026

This version

3.1.0

Jun 17, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

setup_eval-3.1.0.tar.gz (127.8 kB view details)

Uploaded Jun 17, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

setup_eval-3.1.0-py3-none-any.whl (105.8 kB view details)

Uploaded Jun 17, 2026 Python 3

File details

Details for the file setup_eval-3.1.0.tar.gz.

File metadata

Download URL: setup_eval-3.1.0.tar.gz
Upload date: Jun 17, 2026
Size: 127.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for setup_eval-3.1.0.tar.gz
Algorithm	Hash digest
SHA256	`37098677d1dfd7ad6709d0d5e1511ae5960f576ab65705d5bec356524e0c3c56`
MD5	`34fa7915ea09734f729638bd91d9345b`
BLAKE2b-256	`1104fb627f5f99b6d6ee5cab7f3bcc0c85627bfd1d6c1e95be54c4060637215d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for setup_eval-3.1.0.tar.gz:

Publisher: publish.yml on redhat-community-ai-tools/harness-eval-lab

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: setup_eval-3.1.0.tar.gz
- Subject digest: 37098677d1dfd7ad6709d0d5e1511ae5960f576ab65705d5bec356524e0c3c56
- Sigstore transparency entry: 1846608796
- Sigstore integration time: Jun 17, 2026
Source repository:
- Permalink: redhat-community-ai-tools/harness-eval-lab@5678862d084c7e1ae0a404ffdd9d3534931ec86c
- Branch / Tag: refs/tags/v3.1.0
- Owner: https://github.com/redhat-community-ai-tools
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@5678862d084c7e1ae0a404ffdd9d3534931ec86c
- Trigger Event: push

File details

Details for the file setup_eval-3.1.0-py3-none-any.whl.

File metadata

Download URL: setup_eval-3.1.0-py3-none-any.whl
Upload date: Jun 17, 2026
Size: 105.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for setup_eval-3.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`84face0fd94c38bd704f9d4ac878a1ee09ccec44ba5a07933f965a9db7b9ae01`
MD5	`1225dc7baa053c3d2277288a82e5c9cd`
BLAKE2b-256	`9dfc2ceb0d1e0b5a27339e1cf3c8d1fcf6d31d2f6401f6df3ad64c601ca9a288`

See more details on using hashes here.

Provenance

The following attestation bundles were made for setup_eval-3.1.0-py3-none-any.whl:

Publisher: publish.yml on redhat-community-ai-tools/harness-eval-lab

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: setup_eval-3.1.0-py3-none-any.whl
- Subject digest: 84face0fd94c38bd704f9d4ac878a1ee09ccec44ba5a07933f965a9db7b9ae01
- Sigstore transparency entry: 1846608852
- Sigstore integration time: Jun 17, 2026
Source repository:
- Permalink: redhat-community-ai-tools/harness-eval-lab@5678862d084c7e1ae0a404ffdd9d3534931ec86c
- Branch / Tag: refs/tags/v3.1.0
- Owner: https://github.com/redhat-community-ai-tools
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@5678862d084c7e1ae0a404ffdd9d3534931ec86c
- Trigger Event: push

setup-eval 3.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

setup-eval

What it does

Overview

Install

From PyPI

From source

As a Claude Code plugin

For Cursor users

Usage

CLI

CLI Commands

Plugin Skills

Inspection Rules (43)

Future Plans

Contributing

Changelog

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance