Last released Dec 26, 2025
CI for tool-using agents: deterministic eval suites with record/replay, hard assertions, budgets, and PR regression gates.
Supported by