Profile of alexgshaw

Last released Jul 7, 2026

A framework for evaluating and optimizing agents and models using sandboxed environments.

Last released Jul 3, 2026

Add your description here

Last released Jul 3, 2026

LangSmith plugin for Harbor jobs.

Last released Jun 27, 2026

Lightweight grading toolkit for environment-based tasks.

Last released Nov 17, 2025

Terminus CLI: An autonomous AI agent for terminal-based task execution

Last released Sep 26, 2025

Terminal-bench is a collection of tasks and evaluation harness for evaluating AI agents' ability to complete complex tasks in terminal environments.

Last released Aug 18, 2025

Add your description here

Last released Apr 11, 2025

A library for building agentic benchmarks.

Alex Shaw