GRASP — self-improvement via a regression-gated skill library learned from an agent's own failure traces

These details have not been verified by PyPI

Project links

Project description

GRASP

GRASP is a self-improvement method that learns a small, regression-gated skill library from an agent's own failure traces. A proposed skill is kept only when it improves performance on a held-out probe set — so the library grows by keeping what demonstrably helps and discarding what doesn't.

This repository is two things:

A reusable method + framework (grasp/) — apply GRASP to your own agent and tasks, and benchmark your own self-improvement method against GRASP and five baselines through a small plug-in interface.
The full paper artifact — four benchmark families (benchmarks/) and all released results behind the paper (results/).

Install

pip install -e .          # core depends only on PyYAML

Quickstart (no Docker, no server)

Watch GRASP learn skills on a laptop in minutes, on a self-contained slice of MedAgentBench's read-only FHIR lookup tasks served by an in-process mock:

# point the 'local' backend at any OpenAI-compatible endpoint
export OPENAI_BASE_URL="http://localhost:8000/v1"
export OPENAI_API_KEY="EMPTY"
export GRASP_MODEL="your-model-name"

python -m examples.quickstart.run --agent local

It writes a val-accuracy learning curve and the learned skill library under examples/quickstart/runs/. See examples/quickstart/.

Use GRASP on your own agent

Implement a Task (how to sample, run, and score your environment) and run GRASP on it:

from grasp import run_grasp
run_grasp(MyTask(), "config.yaml", agent="local")

Task — samples(), rollout(sample, agent), evaluate(sample, output), plus optional failure_tags / protocol_hook / updater_* hooks.
Method — GRASP is the reference Method; subclass it to benchmark your own self-improvement method on the same tasks.

Read this	For
`docs/method.md`	how GRASP works — the loop and the regression gate
`docs/add_a_task.md`	plug in your own environment
`docs/add_a_method.md`	benchmark your own method vs. GRASP + 5 baselines

Benchmarks (the paper artifact)

Each benchmark is self-contained under benchmarks/, with its own README for environment setup (conda, Docker, data) and a run_all.sh <backend> [run_name] helper.

Directory	Benchmark	Role in paper	Setup
`benchmarks/MedAgentBench/`	FHIR reads/writes against a live FHIR server	primary (clinical)	Docker
`benchmarks/MedAgentBench-v2/`	Harder FHIR tasks: multi-step decisions, coordinated writes	primary (clinical)	Docker
`benchmarks/FHIR-AgentBench/`	Structured clinical QA / tool use on an independent FHIR store	supporting (clinical)	GCP Healthcare API
`benchmarks/AgentBench/`	Four non-clinical environments: OS, DBBench, WebShop, ALFWorld	supporting (generality)	Docker

The paper compares GRASP against a no-skills baseline and five self-improvement methods, all implemented in each benchmark directory: grasp (GRASP, ours), memory_cycle (Sequential memory), batch_memory_cycle (Batch memory), expel_cycle (ExpeL), evo_memory_cycle (Evo-MedAgent), skillx_cycle (SkillX).

The executing agent and skill-writer use the same model; five backends are selectable at run time (gptoss, deepseek, gemini, gpt5, gpt4, or a generic local OpenAI-compatible endpoint). No secrets are stored in the repository — presets read endpoints and keys from environment variables. See each benchmark's configs/agents/README.md.

Released results

All numbers behind the paper live under results/ — per-seed validation, test, and OOD accuracies for every cell of Tables 1–5, the learned skill libraries, the frozen transfer libraries, and the run configurations. Reproduce the headline tables directly:

python results/reproduce_tables.py                 # Table 1 (all models) + Table 5
python results/reproduce_tables.py gpt-oss-120b     # one model

See results/README.md for the full directory↔cell map.

License

MIT (see LICENSE) for the GRASP core, examples, and docs. Vendored benchmark code under benchmarks/AgentBench/ and benchmarks/FHIR-AgentBench/ retains its own upstream license.

Citation

See CITATION.cff.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

May 27, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

grasp_skills-0.1.0.tar.gz (42.4 kB view details)

Uploaded May 27, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

grasp_skills-0.1.0-py3-none-any.whl (44.7 kB view details)

Uploaded May 27, 2026 Python 3

File details

Details for the file grasp_skills-0.1.0.tar.gz.

File metadata

Download URL: grasp_skills-0.1.0.tar.gz
Upload date: May 27, 2026
Size: 42.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for grasp_skills-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`76789ff783ff6067364b015ff0c1c2284ed5803b79fb6c453e34fdfeb5cda538`
MD5	`29ec7706e778d89598b113040c310dbf`
BLAKE2b-256	`34feb4578612c170972fb6304fafdce856d3df7ef0b3410556752af645ed81dc`

See more details on using hashes here.

File details

Details for the file grasp_skills-0.1.0-py3-none-any.whl.

File metadata

Download URL: grasp_skills-0.1.0-py3-none-any.whl
Upload date: May 27, 2026
Size: 44.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for grasp_skills-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7a4a02600df39a1b4b2a0181322a3f8687be00d7cc2d5bcba9ba7fcb86c6c810`
MD5	`f621e7810353d7914aeddb434b4e3609`
BLAKE2b-256	`3704abaa3f637947858a93bd548f60a75413971f7bceb4053d9350f98eaa5979`

See more details on using hashes here.

grasp-skills 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

GRASP

Install

Quickstart (no Docker, no server)

Use GRASP on your own agent

Benchmarks (the paper artifact)

Released results

License

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes