Last released May 29, 2026
CLI benchmark suite for evaluating AI agents across providers.
Supported by