SDK for the HUD platform.
Project description
HUD is a platform for building RL environments for AI agents, across coding, browser, computer-use, and robotics. Define an environment, write tasks, and run them as evals and training across any model, at any scale.
To learn more, see the documentation and environment reference.
Install
# Install the CLI (recommended)
uv tool install hud-python --python 3.12
# …or as a library
pip install hud-python
Get your API key at hud.ai/project/api-keys and set it:
hud set HUD_API_KEY=your-key-here
# or: export HUD_API_KEY=your-key-here
Then scaffold your first environment:
hud init my-env
The protocol
HUD is protocol-first. An agent and an environment exchange just three things: a manifest (the environment's capabilities and tasks), tasks.start that returns the prompt, and tasks.grade that returns the reward. In between, the agent just works, driving the capabilities itself. HUD owns only that thin envelope, so any model or harness plugs into any environment.
sequenceDiagram
participant Agent
participant Env as Environment
participant Caps as Capabilities (ssh · mcp · cdp · rfb · robot)
Note over Env,Caps: environment holds & serves these
Agent->>Env: hello
Env-->>Agent: manifest (capabilities)
Agent->>Env: tasks.start
Env-->>Agent: prompt
rect rgb(238,238,238)
Note over Agent,Caps: the agent works, driving capabilities directly
Agent->>Caps: shell · browser · GUI · tools · robot
Caps-->>Agent: observations
end
Agent->>Env: tasks.grade
Env-->>Agent: reward
Because the protocol only exposes capabilities (never a fixed agent), an environment outlives any single harness: new harnesses and models keep running against the same environments, benchmarks, and tasks.
Package & run anywhere
A built image is the end product for your tasks: one build packs every task from a single definition. The recommended path is hud deploy, which builds and registers your environment on HUD in one step; then sync a taskset and run remotely:
hud deploy
hud sync tasks my-taskset
hud eval my-taskset --remote
For local iteration, the same protocol works against a container on your laptop:
docker build -f Dockerfile.hud -t my-env .
docker run -d --name run1 -p 8765:8765 my-env
hud task start fix_bug --url tcp://127.0.0.1:8765
hud task grade fix_bug --url tcp://127.0.0.1:8765 --answer "..."
docker rm -f run1
Environments & templates
A template is an async generator registered with @env.template(): yield a prompt, receive the agent's answer, yield a reward. Calling the template mints a runnable Task; one function spans a whole dataset of variants. The simplest needs no capabilities — just a prompt and a grader:
from hud import Environment
env = Environment(name="letter-count")
@env.template()
async def count_letter(word: str = "strawberry", letter: str = "r"):
answer = yield f"How many '{letter}'s are in '{word}'? Reply with just the number."
yield 1.0 if answer and str(word.count(letter)) in answer else 0.0
tasks = [count_letter(word=w) for w in ("strawberry", "raspberry", "blueberry")]
Run it immediately against any model:
hud eval tasks.py claude --group 3
Each graded evaluation is a trace (the SDK's live handle is a Run). With HUD_API_KEY set, every rollout is recorded on hud.ai. Tasks that need a shell, browser, GUI, or robot declare capabilities (below); everything else — variants, grading, batching — stays identical.
→ Quickstart · Tasks & tasksets
Capabilities & harnesses
A capability is a connection the environment exposes; a harness attaches its own tools to it. The same environment serves a one-shot Q&A or a full computer-use rollout, depending on which capabilities the harness opens.
| Protocol | What it exposes |
|---|---|
ssh |
Shell + files in a sandboxed workspace (env.workspace(root)) |
mcp |
Tools over the Model Context Protocol |
cdp |
Browser control over the Chrome DevTools Protocol |
rfb |
Full computer-use over VNC: screen + keyboard/mouse |
robot (beta) |
Schema-driven robot observation/action loop over WebSocket |
Ships natively: Claude, OpenAI (Responses), OpenAI-compatible endpoints, and Gemini via create_agent("claude-sonnet-4-5") (or gpt-…, gemini-…). The harness wires capability-backed tools for the model you choose at run time.
Bring your own: a harness attaches to a capability and defines a tool spec — wrap browser-use on cdp, a VLA policy on robot, or your own agent on ssh / mcp. No protocol work required.
→ Capabilities · Models · Robots
Deploy on the platform
From the platform UI you can run batches, compare models on the same taskset, and inspect every trace.
Train on rewards
Every rollout returns a Run carrying a trace_id and a reward, so the tasks you evaluate are already training data. Run a group per task and pass the graded runs to TrainingClient.step():
from hud import TrainingClient
from hud.agents import create_agent
from hud.eval import Job
agent = create_agent("arith-rl", completion_kwargs={"extra_body": {"return_token_ids": True}})
trainer = TrainingClient("arith-rl")
taskset, runtime = ... # your Taskset and where rollouts run
session = await Job.start("arith-rl", group=8)
start = len(session.runs)
await taskset.run(agent, runtime=runtime, group=8, job=session)
await trainer.step(session.runs[start:], learning_rate=1e-5, group_size=8)
HUD is the environment-and-reward source for your own GRPO/PPO loop — the same environment trains any model, text or multimodal, unchanged.
→ Training · Designing tasks for signal
Links
Enterprise
Building agents at scale? We work with teams on custom environments, benchmarks, and training.
📅 Book a call · 📧 founders@hud.ai
Contributing
We welcome contributions! See CONTRIBUTING.md.
Key areas: Agents · Environments · Capabilities · Eval
Citation
@software{hud2025agentevalplatform,
author = {HUD and Jay Ram and Lorenss Martinsons and Parth Patel and Govind Pimpale and Dylan Bowman and Jaideep Chawla and Nguyen Nhat Minh},
title = {HUD: An Evaluation and RL Environments Platform for Agents},
date = {2025-04},
url = {https://github.com/hud-evals/hud-python},
langid = {en}
}
MIT License · LICENSE
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hud_python-0.6.8.dev1.tar.gz.
File metadata
- Download URL: hud_python-0.6.8.dev1.tar.gz
- Upload date:
- Size: 342.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.24 {"installer":{"name":"uv","version":"0.11.24","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c1137d3aa2c82408d972184629c82c84e3e54866e3f094eb06b607ef2b00a396
|
|
| MD5 |
8c61e0e107ffc0c5ab0dd144376dac45
|
|
| BLAKE2b-256 |
9dbb7d569577a852fca915742a20b6b4a4a6b33b662ea417fc0a11382e7cd0af
|
File details
Details for the file hud_python-0.6.8.dev1-py3-none-any.whl.
File metadata
- Download URL: hud_python-0.6.8.dev1-py3-none-any.whl
- Upload date:
- Size: 438.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.24 {"installer":{"name":"uv","version":"0.11.24","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6468aa16000d9a3ca1359b93e1d8cc4b37f0444d65949d32316192c2757681b8
|
|
| MD5 |
dcb00f2fd699a5d41682ec5781d4e4cc
|
|
| BLAKE2b-256 |
fd90c81bfd4b21be8fc9050f9e4534ec8125c44cf3ca18d371afa9bb2d03f894
|