34 projects
clawbench-eval
Benchmarking framework for evaluating AI web agents on real-world online tasks
autoresearch-gym
AutoResearchGym: Can AI Agents Automate AI Research? — placeholder; code release in progress.
autoresearch-2
E = AutoResearch²: Scaling the Research Process — placeholder; code release in progress.
harness-bench
HarnessBench: compare agentic harnesses on everyday online tasks (sister project to ClawBench).
scaling-law
ClawBench: Can AI Agents Complete Everyday Online Tasks? (alias of claw-bench)
r2-harness
ClawBench: Can AI Agents Complete Everyday Online Tasks? (alias of claw-bench)
harness-hub
ClawBench: Can AI Agents Complete Everyday Online Tasks? (alias of claw-bench)
nail-group
ClawBench: Can AI Agents Complete Everyday Online Tasks? (alias of claw-bench)
nail-eval
ClawBench: Can AI Agents Complete Everyday Online Tasks? (alias of claw-bench)
nail-agent
ClawBench: Can AI Agents Complete Everyday Online Tasks? (alias of claw-bench)
nail-bench
ClawBench: Can AI Agents Complete Everyday Online Tasks? (alias of claw-bench)
video-judge
ClawBench: Can AI Agents Complete Everyday Online Tasks? (alias of claw-bench)
vlm-judge
ClawBench: Can AI Agents Complete Everyday Online Tasks? (alias of claw-bench)
mcq-bench
ClawBench: Can AI Agents Complete Everyday Online Tasks? (alias of claw-bench)
video-mcq
ClawBench: Can AI Agents Complete Everyday Online Tasks? (alias of claw-bench)
task-harness
ClawBench: Can AI Agents Complete Everyday Online Tasks? (alias of claw-bench)
web-harness
ClawBench: Can AI Agents Complete Everyday Online Tasks? (alias of claw-bench)
realtask-bench
ClawBench: Can AI Agents Complete Everyday Online Tasks? (alias of claw-bench)
life-bench
ClawBench: Can AI Agents Complete Everyday Online Tasks? (alias of claw-bench)
everyday-agent
ClawBench: Can AI Agents Complete Everyday Online Tasks? (alias of claw-bench)
everyday-bench
ClawBench: Can AI Agents Complete Everyday Online Tasks? (alias of claw-bench)
claw-eval
ClawBench: Can AI Agents Complete Everyday Online Tasks? (alias of claw-bench)
claw-agent
ClawBench: Can AI Agents Complete Everyday Online Tasks? (alias of claw-bench)
claw-ai
ClawBench: Can AI Agents Complete Everyday Online Tasks? (alias of claw-bench)
r2agent
ClawBench: Can AI Agents Complete Everyday Online Tasks? (alias of claw-bench)
harnessos
ClawBench: Can AI Agents Complete Everyday Online Tasks? (alias of claw-bench)
nail-clawbench
ClawBench: Can AI Agents Complete Everyday Online Tasks? (alias of claw-bench)
claw-harness
ClawBench: Can AI Agents Complete Everyday Online Tasks? (alias of claw-bench)
clawbench-harness
ClawBench: Can AI Agents Complete Everyday Online Tasks? (alias of claw-bench)
openclawbench
ClawBench: Can AI Agents Complete Everyday Online Tasks? (alias of claw-bench)
clawbench-cli
ClawBench: Can AI Agents Complete Everyday Online Tasks? (alias of claw-bench)
gpuwatch
Lightweight NVIDIA GPU monitor — 20 notification channels (Slack, Discord, Telegram, ntfy, Teams, PagerDuty, Zulip, OpenClaw, and more), Prometheus/InfluxDB/Datadog metrics, crash/ECC detection, Kubernetes, GitHub Pages dashboard
gpu-watchdog
Lightweight NVIDIA GPU monitor — 20 notification channels (Slack, Discord, Telegram, ntfy, Teams, PagerDuty, Zulip, OpenClaw, and more), Prometheus/InfluxDB/Datadog metrics, crash/ECC detection, Kubernetes, GitHub Pages dashboard
nvidia-gpu-monitor
Lightweight NVIDIA GPU monitor — 20 notification channels (Slack, Discord, Telegram, ntfy, Teams, PagerDuty, Zulip, OpenClaw, and more), Prometheus/InfluxDB/Datadog metrics, crash/ECC detection, Kubernetes, GitHub Pages dashboard