Run OpenReward environments through the Inspect eval platform
Project description
inspect-openreward
Run OpenReward environments as Inspect evals.
Provides an Inspect-native Dataset, Scorer, and a session-lifecycle wrapper solver for any OpenReward environment, so you get Inspect's eval harness, transcript viewer, metrics, and model abstraction for free — and OpenReward's tools, tasks, and rewards surface as first-class Inspect primitives. The wrapper takes an arbitrary inner solver chain, so you can keep the default react-style loop or plug in your own scaffolding (system_message, basic_agent, react, custom @solver, …) without touching session management.
Install
uv venv
uv sync
Set OPENREWARD_API_KEY and whichever model provider keys you need (OPENAI_API_KEY, ANTHROPIC_API_KEY, etc.).
Quickstart
See ./src/example/terminal_bench_2_verified.py for a runnable version with both a default, and a customer solver chain, task.
Custom solver chains
openreward_solver is a wrapper: it owns session open/close, prompt injection, tool conversion + installation, and reward capture, and then runs whatever inner solver chain you hand it. That means any Inspect solver composition slots in — chain together system_message, prompt_template, chain_of_thought, self_critique, use_tools(..., append=True), basic_agent, react, or your own @solver — and the OpenReward session-bound tools remain available throughout:
from inspect_ai.solver import chain, generate, system_message
@task
def terminal_bench_2_verified_custom() -> Task:
env = OpenReward().environments.get(name="GeneralReasoning/terminal-bench-2-verified")
return Task(
dataset=openreward_dataset(env, split="test", limit=10),
solver=openreward_solver(
env,
chain(
system_message("Think carefully before each tool call."),
generate(tool_calls="loop"),
),
toolset="claude-code"
),
scorer=openreward_scorer(),
)
Reward / finished capture is baked into the installed tools, so it keeps working no matter how the inner chain drives tool calls (generate(...), execute_tools(...) inside basic_agent, a custom loop, …).
What each piece does
openreward_dataset(environment, split, limit=None, shuffle=False, seed=None)
Builds an Inspect Dataset from an OpenReward environment split. Each Sample carries the underlying OpenReward Task in metadata; the prompt is resolved lazily by the solver (so image prompts work, and there's no network round-trip at dataset-construction time).
openreward_solver(environment, solver=None, *, toolset=None, tool_choice="auto")
Session-lifecycle wrapper around an arbitrary inner solver chain. Per sample it:
- Opens
environment.session(task=..., toolset=toolset). - Fetches
session.get_prompt()and injects it as the user message (text and image blocks both supported). - Converts
session.list_tools()into Inspect tools, auto-detecting the provider fromstate.model.apiand sanitising the JSON schema viaopenreward.sanitize_tool_schema, and installs them onstate.tools/state.tool_choice. - Runs the inner
solverinside the open session. Ifsolver=None(the default), runsgenerate(tool_calls="loop")— the react-style loop. Pass aSolveror alist[Solver](composed viachain(...)) to customise. - Captures the terminal
reward/finishedfrom tool outputs intostate.metadatafor the scorer to read — regardless of how the inner chain invokes tools. - Closes the session on teardown.
Arguments
environment: the OpenRewardEnvironmentto open sessions against. The dataset should be built from the same environment.solver: inner solver (orlist[Solver], normalised viainspect_ai.solver.chain). Defaults togenerate(tool_calls="loop").toolset(keyword-only): optional OpenReward toolset name passed toenvironment.session(...)— e.g."claude-code"for a harness-native bash tool surface in addition to the environment's own tools.tool_choice(keyword-only): passed through to Inspect'sstate.tool_choice.
Inner chains can layer on further tools via use_tools(extra_tools, append=True) — the session-bound tools installed by the wrapper remain available.
openreward_scorer()
@scorer that reads the terminal reward captured by openreward_solver and returns an Inspect Score. Metrics: mean() and stderr(). Samples that never produced a reward (e.g. ran out of turns) score 0.0.
openreward_tool_to_inspect(tool_spec, session, provider=None)
Low-level helper for converting a single OpenReward ToolSpec into an Inspect Tool. The openreward_solver wrapper is built on top of this; use it directly if you want to bypass the wrapper and manage the session yourself.
Model provider mapping
The solver maps Inspect's model-provider identifier to OpenReward's Provider enum for schema sanitisation:
Inspect state.model.api |
OpenReward provider |
|---|---|
openai, openai-api |
openai |
anthropic |
anthropic |
google, vertex |
google |
openrouter |
openrouter |
| anything else | openai (safest superset) |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file inspect_openreward-0.2.0.tar.gz.
File metadata
- Download URL: inspect_openreward-0.2.0.tar.gz
- Upload date:
- Size: 188.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
31f3697c45b18cd4423710a7a0677a573a3b6d590b37f1fa6a15df9aecd17902
|
|
| MD5 |
c8222ca20c5ade67fb665fba228966bf
|
|
| BLAKE2b-256 |
cada0b2a4bc60d27e67fbec957271d610ecdfbc97780a24b55f06cfe8d488b8c
|
File details
Details for the file inspect_openreward-0.2.0-py3-none-any.whl.
File metadata
- Download URL: inspect_openreward-0.2.0-py3-none-any.whl
- Upload date:
- Size: 10.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a1bd1e81258e579a3f57e27d5aba4436cf8c2b047cc93d0e02995b0aadc1f348
|
|
| MD5 |
cdf924f524b9e8c9e7e4e60be7de8028
|
|
| BLAKE2b-256 |
4c224482aaa956654d6c4b4bc0c8bca53e62fd9dd42e3e6b77f25755dc0772ad
|