Skip to main content

Evaluate and compare AI agent setups through experiments, inspections, and rubric scoring.

Project description

setup-eval

CI PyPI Python 3.11+ License: Apache 2.0

Evaluate AI agent setups for best practices, redundancy, security, and cross-component issues.

Works with Claude Code and Cursor. Auto-detects which tool(s) a project uses.

Install

pip install setup-eval

For YARA malware signature scanning, add: pip install setup-eval[yara]

What it does

Most tools test whether a skill produces correct output. This tool checks the setup itself: CLAUDE.md, skills, commands, hooks, MCP configs, agents, .cursor/rules/*.mdc, .cursorrules.

Four commands, same engine:

Command What it does Needs LLM?
setup-eval-lint 43 deterministic rules + system analysis (token budget, trigger overlaps, dependencies). Fast, CI-suitable. No
setup-eval-review Per-component rubric review, 21 cross-type checks. KEEP/REVIEW/REMOVE verdicts. Yes
setup-eval-security All security rules + YARA + CVE lookups + 4-check semantic review. SAFE/CAUTION/UNSAFE. Optional
eval-skill Deep-evaluate one skill individually and in context of the full setup. Optional

How to use it

CLI

setup-eval setup-eval-lint .
setup-eval setup-eval-lint . --preset strict --format json --fail-on-error

setup-eval setup-eval-review . --provider gemini
setup-eval setup-eval-security . --review
setup-eval eval-skill ./skills/my-skill --context . --rubric

Requires GEMINI_API_KEY or ANTHROPIC_API_KEY for review/security/skill commands.

Claude Code (plugin)

/plugin marketplace add redhat-community-ai-tools/harness-eval-lab
/plugin install setup-eval@setup-eval
/reload-plugins

After installing, use from the / menu: /setup-eval:setup-eval-lint, /setup-eval:setup-eval-review, /setup-eval:setup-eval-security, /setup-eval:eval-skill. No API key needed; Claude evaluates in-session.

Updating: Re-run the install command to get the latest rules.

Cursor

pip install setup-eval

Copy .cursor/commands/ from this repo into your project. The 4 commands appear in Cursor's command palette: /setup-eval-lint, /setup-eval-review, /setup-eval-security, /eval-skill. No API key needed; Cursor evaluates in-session.

Inspection Rules (43)

Category Rules What they check
Structural 1 SKILL.md exists
Frontmatter 3 Description required/quality, format valid
Content 4 Duplicate detection (TF-IDF), broken references, circular references, token budget
Security 9 Credential access, prompt injection (17 patterns), data exfiltration, obfuscation, reverse shells, AST analysis, taint tracking, MCP least-privilege, tool poisoning
Security (opt-in) 2 YARA signatures, CVE lookups via OSV.dev
Commands 8 Description, script exists, duplicates, credentials, injection, skill overlap, shadows built-in, references nonexistent skill
CLAUDE.md 3 Exists, skill duplication, generic advice detection
Hooks 1 Structure validation, dangerous patterns, network access
Agents 9 Description, skills exist, tool format, constraint matching, credentials, injection, exfiltration, obfuscation, reverse shells

Four presets: recommended (default), strict, security, pre-workflow.

Contributing

See CONTRIBUTING.md for adding rules and submitting PRs.

Changelog

See CHANGELOG.md for release history.

Future Plans

See future-plans/ for planned improvements (SARIF output, security benchmarks, runner abstraction, dynamic workflows, impact measurement).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

setup_eval-3.1.2.tar.gz (127.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

setup_eval-3.1.2-py3-none-any.whl (104.3 kB view details)

Uploaded Python 3

File details

Details for the file setup_eval-3.1.2.tar.gz.

File metadata

  • Download URL: setup_eval-3.1.2.tar.gz
  • Upload date:
  • Size: 127.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for setup_eval-3.1.2.tar.gz
Algorithm Hash digest
SHA256 d7d0a54e4bc565e84b6b12ca57d7ad0b402b2448425465298cced0e87430f4c9
MD5 42f743c9f7456e0d1842ea2613807437
BLAKE2b-256 4bb4ec08641517aaf712e82cb810799cd870bd9e102ccbdcffe3770cef9675f5

See more details on using hashes here.

Provenance

The following attestation bundles were made for setup_eval-3.1.2.tar.gz:

Publisher: publish.yml on redhat-community-ai-tools/harness-eval-lab

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file setup_eval-3.1.2-py3-none-any.whl.

File metadata

  • Download URL: setup_eval-3.1.2-py3-none-any.whl
  • Upload date:
  • Size: 104.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for setup_eval-3.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 3d8cadd0a505fd4ed283d138f930e5128560a8ff7d87ae83f5f3215b56853967
MD5 ed24596f0965c28577a8e104f2e7acc6
BLAKE2b-256 9037cb275081dc6d2941fb7106ec595e480e0cd2d165fd175dc860be7600393b

See more details on using hashes here.

Provenance

The following attestation bundles were made for setup_eval-3.1.2-py3-none-any.whl:

Publisher: publish.yml on redhat-community-ai-tools/harness-eval-lab

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page