Skip to main content
Avatar for Xiangyi Li from gravatar.com

Xiangyi Li

Username    xdotli
Date joined   Joined

21 projects

gicsbench

Last released

GICS benchmarking framework for AI agents

govbench

Last released

Governance benchmarking framework for AI agents

benchflow

Last released

Multi-turn agent benchmarking with ACP — run any agent, any model, any provider.

benchskills

Last released

Agent skills benchmarking framework

neoswe

Last released

NeoSWE — next-generation software engineering benchmark

clawsbench

Last released

ClawsBench: Evaluating Capability and Safety of LLM Productivity Agents in Simulated Workspaces

onlylabs

Last released

OnlyLabs

srbench

Last released

SRBench — self-rewarding benchmark

selfrewardbench

Last released

SelfRewardBench — benchmark for self-rewarding agents

selfreward

Last released

SelfReward — self-rewarding agents

autoreward

Last released

AutoReward — automated reward modeling

rsibench

Last released

RSI Bench — agent benchmark suite

clawverse

Last released

ClawVerse — composable agent task universes

clawuniverse

Last released

ClawUniverse — a universe of agent environments

smolclaws

Last released

Mock environments for AI agent testing. https://smolclaw.com

skillsbench

Last released

Skillsbench - A placeholder package

computer-use-core

Last released

A placeholder package for computer-use-core

pokemon-gym

Last released

A placeholder package for pokemon-gym

comp-use

Last released

A placeholder package for comp-use

computer-gym

Last released

A placeholder package for computer-gym

benchmarkthing

Last released

Evals as an API - The easiest way to evaluate and benchmark AI models and systems

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page