Last released Jun 27, 2026
Evaluate LLMs against behavioral specifications (AGENTS.md, Claude.md, custom rules)
Supported by