Skip to main content

Bridge between Inspect AI tools and the OpenReward specification.

Project description

inspect-openreward

Run OpenReward environments as Inspect AI evals.

Provides an Inspect-native Dataset, Scorer, and a session-lifecycle wrapper solver for any OpenReward environment, so you get Inspect's eval harness, transcript viewer, metrics, and model abstraction for free — and OpenReward's tools, tasks, and rewards surface as first-class Inspect primitives. The wrapper takes an arbitrary inner solver chain, so you can keep the default react-style loop or plug in your own scaffolding (system_message, basic_agent, react, custom @solver, …) without touching session management.

Install

uv venv
uv sync

Set OPENREWARD_API_KEY and whichever model provider keys you need (OPENAI_API_KEY, ANTHROPIC_API_KEY, etc.).

Quickstart

from inspect_ai import Task, eval, task
from openreward import OpenReward

from inspect_openreward import (
    openreward_dataset,
    openreward_scorer,
    openreward_solver,
)


@task
def endless_terminals() -> Task:
    env = OpenReward().environments.get(name="kanishk/EndlessTerminals")
    return Task(
        dataset=openreward_dataset(env, split="train", limit=10),
        solver=openreward_solver(env),
        scorer=openreward_scorer(),
    )


if __name__ == "__main__":
    eval(endless_terminals, model="openai/gpt-4.1")

The zero-arg openreward_solver(env) call runs the default react-style loop (generate(tool_calls="loop")) against the environment's tools — nothing else to configure.

See ./src/example/endless_terminals.py for a runnable version with both the default and a custom-chain task.

Custom solver chains

openreward_solver is a wrapper: it owns session open/close, prompt injection, tool conversion + installation, and reward capture, and then runs whatever inner solver chain you hand it. That means any Inspect solver composition slots in — chain together system_message, prompt_template, chain_of_thought, self_critique, use_tools(..., append=True), basic_agent, react, or your own @solver — and the OpenReward session-bound tools remain available throughout:

from inspect_ai.solver import chain, generate, system_message

@task
def endless_terminals_custom() -> Task:
    env = OpenReward().environments.get(name="kanishk/EndlessTerminals")
    return Task(
        dataset=openreward_dataset(env, split="train", limit=10),
        solver=openreward_solver(
            env,
            chain(
                system_message("Think carefully before each tool call."),
                generate(tool_calls="loop"),
            ),
        ),
        scorer=openreward_scorer(),
    )

Reward / finished capture is baked into the installed tools, so it keeps working no matter how the inner chain drives tool calls (generate(...), execute_tools(...) inside basic_agent, a custom loop, …).

What each piece does

openreward_dataset(environment, split, limit=None, shuffle=False, seed=None)

Builds an Inspect Dataset from an OpenReward environment split. Each Sample carries the underlying OpenReward Task in metadata; the prompt is resolved lazily by the solver (so image prompts work, and there's no network round-trip at dataset-construction time).

openreward_solver(environment, solver=None, *, toolset=None, tool_choice="auto")

Session-lifecycle wrapper around an arbitrary inner solver chain. Per sample it:

  1. Opens environment.session(task=..., toolset=toolset).
  2. Fetches session.get_prompt() and injects it as the user message (text and image blocks both supported).
  3. Converts session.list_tools() into Inspect tools, auto-detecting the provider from state.model.api and sanitising the JSON schema via openreward.sanitize_tool_schema, and installs them on state.tools / state.tool_choice.
  4. Runs the inner solver inside the open session. If solver=None (the default), runs generate(tool_calls="loop") — the react-style loop. Pass a Solver or a list[Solver] (composed via chain(...)) to customise.
  5. Captures the terminal reward / finished from tool outputs into state.metadata for the scorer to read — regardless of how the inner chain invokes tools.
  6. Closes the session on teardown.

Arguments

  • environment: the OpenReward Environment to open sessions against. The dataset should be built from the same environment.
  • solver: inner solver (or list[Solver], normalised via inspect_ai.solver.chain). Defaults to generate(tool_calls="loop").
  • toolset (keyword-only): optional OpenReward toolset name passed to environment.session(...) — e.g. "claude-code" for a harness-native bash tool surface instead of the environment's own tools.
  • tool_choice (keyword-only): passed through to Inspect's state.tool_choice.

Inner chains can layer on further tools via use_tools(extra_tools, append=True) — the session-bound tools installed by the wrapper remain available.

openreward_scorer()

@scorer that reads the terminal reward captured by openreward_solver and returns an Inspect Score. Metrics: mean() and stderr(). Samples that never produced a reward (e.g. ran out of turns) score 0.0.

openreward_tool_to_inspect(tool_spec, session, provider=None)

Low-level helper for converting a single OpenReward ToolSpec into an Inspect Tool. The openreward_solver wrapper is built on top of this; use it directly if you want to bypass the wrapper and manage the session yourself.

Model provider mapping

The solver maps Inspect's model-provider identifier to OpenReward's Provider enum for schema sanitisation:

Inspect state.model.api OpenReward provider
openai, openai-api openai
anthropic anthropic
google, vertex google
openrouter openrouter
anything else openai (safest superset)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

inspect_openreward-0.1.0.tar.gz (186.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

inspect_openreward-0.1.0-py3-none-any.whl (9.7 kB view details)

Uploaded Python 3

File details

Details for the file inspect_openreward-0.1.0.tar.gz.

File metadata

  • Download URL: inspect_openreward-0.1.0.tar.gz
  • Upload date:
  • Size: 186.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for inspect_openreward-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a58dcb97cbed149b32521e016090bb6e19dae42372223c0784e5c24e5c64dac4
MD5 3c1071e6658fe98fa8f5e639034c8f0c
BLAKE2b-256 29c3ef888e4af02af943b9c0600419da88962086e0a2483400a3ef60c2472efd

See more details on using hashes here.

File details

Details for the file inspect_openreward-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: inspect_openreward-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 9.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for inspect_openreward-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b6931eb7680e5f150650a0355aca2efd3c00fe136e5ff1b7e60d8d99cebab271
MD5 17da589810dc5b59ced484f9fcd0d5e1
BLAKE2b-256 ef9f402c2e740dcd8413ece761fafbabb1ba0c38ba5de18b318c1b30d86c3f06

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page