Last released May 10, 2026
Benchmarking framework for evaluating AI web agents on real-world online tasks
Last released May 9, 2026
Supported by