Last released Jun 6, 2026
A red-team benchmark for AI agents in financial workflows: measures regulatory-control bypass (sanctions evasion, threshold structuring, dual-approval defeat) and the efficacy of guardrails that try to stop it.
Supported by