12 projects
waa-cube
WindowsAgentArena benchmark ported to the CUBE protocol
terminalbench2-cube
Terminal-Bench 2 — real-world terminal tasks with pytest-based validation
cube-harness
cube-harness, open source agentic evaluation and data generation framework.
browsercomp-cube
BrowseComp benchmark — web browsing information retrieval
swebench-verified-cube
SWE-bench Verified — 500 real-world GitHub issues with test-based validation
swebench-live-cube
SWE-bench Live — continuously updated, contamination-resistant GitHub issue resolution
cube-infra-toolkit
EAI Toolkit InfraConfig for CUBE resource lifecycle — per-task job sandboxes
cube-infra-modal
Modal InfraConfig for CUBE resource lifecycle — per-task Modal sandboxes
cube-infra-daytona
Daytona InfraConfig for CUBE resource lifecycle — per-task sandbox containers
arithmetic-cube
Simple arithmetic benchmark — submit the correct answer to math problems
cube-infra-azure
Azure InfraConfig for CUBE resource lifecycle
cube-infra-aws
AWS InfraConfig for CUBE resource lifecycle