Skip to main content

SDK for the HUD platform.

Project description

HUD

HUD is a platform for building RL environments for AI agents. Define agent-callable tools, write evaluation scenarios, run evals at scale, and train models on the results.

To learn more, check out our Documentation and API Reference.

PyPI License Add docs to Cursor Discord X Follow Scarf Docs

Install

# Install CLI (recommended)
uv tool install hud-python --python 3.12

Get your API key at [hud.ai](https://hud.ai) and set it:

```bash
export HUD_API_KEY=your-key-here

Get your API key at hud.ai/project/api-keys.

Or install as a library: pip install hud-python

Agent running on SheetBench

Environments

An environment is the harness an agent operates in. It packages tools (functions agents can call) and scenarios (how agents are evaluated) into a single deployable unit. Each environment spins up fresh and isolated for every evaluation.

from hud import Environment

env = Environment("my-env")

@env.scenario("count")
async def count(word: str, letter: str):
    # PROMPT — send a question to the agent.
    # The agent runs its reasoning loop and returns an answer.
    answer = yield f"How many '{letter}' in '{word}'?"

    # SCORE — check the agent's answer against the correct count.
    # Return a reward: 1.0 for correct, 0.0 for wrong.
    correct = str(word.lower().count(letter.lower()))
    yield 1.0 if answer and correct in answer else 0.0

A scenario has two yields. The first sends a prompt — the agent runs between the yields, calling tools and reasoning. The second checks the result and returns a reward (0.0 to 1.0). → Core Concepts

Run an Agent

import hud
from hud.agents import create_agent

task = env("count", word="strawberry", letter="r")
agent = create_agent("claude-sonnet-4-5")

async with hud.eval(task) as ctx:
    result = await agent.run(ctx)

print(f"Reward: {result.reward}")  # 1.0 if agent answers "3"

create_agent() picks the right agent class and native tools for each model. → Environments

Workflow

hud init my-env          # Scaffold environment
cd my-env
hud dev env:env -w env.py    # Run locally with hot-reload
hud eval tasks.py claude     # Run evals locally
hud deploy                   # Deploy to platform
hud sync tasks my-taskset    # Sync tasks to platform

Once deployed, run evals at scale from the CLI or the platform UI:

hud eval my-taskset claude --remote --full

Deploy · Testing & Evaluation

Pre-built Tools

HUD ships tools for computer control, shell execution, file editing, browser automation, and web search. Add them to any environment:

from hud.tools import AnthropicComputerTool, BashTool, EditTool

env.add_tool(AnthropicComputerTool())  # Mouse, keyboard, screenshots
env.add_tool(BashTool())               # Persistent bash shell
env.add_tool(EditTool())               # File viewing and editing

HUD adapts each tool to the model's native format — Claude gets computer_20250124, OpenAI gets computer_use_preview, Gemini gets ComputerUse. → Tools Reference

Model Gateway

Use Claude, GPT, Gemini, or Grok through one OpenAI-compatible endpoint:

from openai import AsyncOpenAI
import os

client = AsyncOpenAI(
    base_url="https://inference.hud.ai",
    api_key=os.environ["HUD_API_KEY"]
)

response = await client.chat.completions.create(
    model="claude-sonnet-4-5",  # or gpt-4o, gemini-2.5-pro (https://hud.ai/models)
    messages=[{"role": "user", "content": "Hello!"}]
)

Every call is traced at hud.ai. → Models

Links

Enterprise

Building agents at scale? We work with teams on custom environments, benchmarks, and training.

📅 Book a call · 📧 founders@hud.ai

Contributing

We welcome contributions! See CONTRIBUTING.md.

Key areas: Agents · Tools · Environments

Citation

@software{hud2025agentevalplatform,
  author = {HUD and Jay Ram and Lorenss Martinsons and Parth Patel and Govind Pimpale and Dylan Bowman and Jaideep and Nguyen Nhat Minh},
  title  = {HUD: An Evaluation and RL Envrionments Platform for Agents},
  date   = {2025-04},
  url    = {https://github.com/hud-evals/hud-python},
  langid = {en}
}

MIT License · LICENSE

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hud_python-0.5.38.tar.gz (607.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hud_python-0.5.38-py3-none-any.whl (794.6 kB view details)

Uploaded Python 3

File details

Details for the file hud_python-0.5.38.tar.gz.

File metadata

  • Download URL: hud_python-0.5.38.tar.gz
  • Upload date:
  • Size: 607.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for hud_python-0.5.38.tar.gz
Algorithm Hash digest
SHA256 e2d26d045598de680d080918d9a54472929fb72808df4179756f01ba7d24e049
MD5 0aa44499d4aa9b61d8e3268a68bcb42c
BLAKE2b-256 9f3cd6ca22ea92e178b03a318cfa5fdeb04cbffb1b7d92a4d9c0ebb8e9f2465b

See more details on using hashes here.

File details

Details for the file hud_python-0.5.38-py3-none-any.whl.

File metadata

  • Download URL: hud_python-0.5.38-py3-none-any.whl
  • Upload date:
  • Size: 794.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for hud_python-0.5.38-py3-none-any.whl
Algorithm Hash digest
SHA256 91f6152afb9cc160d361fc234a0d3483066c806cabc30bc2251e0677f33cb874
MD5 ae9fedd7dbb63e5ad65731c472d632d0
BLAKE2b-256 fad6bb0f2d6cbfd32c1224f2af03a84976b5a5d59ae93c76d171b3cfa80ec7c1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page