Skip to main content

RL environments for Strands Agents — step, observe, reward.

Project description

strands-env

CI PyPI License

RL environment abstraction for Strands Agents — step, observe, reward.

Features

This package standardizes agent environments by treating each env.step() as a full agent loop, not a single model call or tool call. Built on strands agent loop and strands-sglang for RL training.

  • Define environments easily — subclass Environment and implement tools as @tool functions
  • Capture token-level observations — token-in/token-out trajectories for on-policy RL training (SGLang backend)
  • Plug in reward functions — evaluate agent outputs with custom RewardFunction
  • Run benchmarksEvaluator with flexible environment setup, metric customization, and resume

An agent loop can be defined as (prompt → (tool_call, tool_response+)* → response)

Install

pip install strands-env

For development:

git clone https://github.com/horizon-rl/strands-env.git && cd strands-env
pip install -e ".[dev]"

Usage

Define an Environment

Subclass Environment and add tools as @tool-decorated functions:

from strands import tool
from strands_env.core import Environment

@tool
def calculator(expression: str) -> str:
    """Evaluate a math expression."""
    return str(eval(expression))

class MathEnv(Environment):
    def get_tools(self):
        return [calculator]

Run It

env = MathEnv(model_factory=factory, reward_fn=reward_fn)
result = await env.step(Action(message="What is 2^10?", task_context=TaskContext(ground_truth="1024")))

result.observation.final_response   # "1024"
result.observation.tokens           # TokenObservation (SGLang only)
result.reward.reward                # 1.0
result.termination_reason           # TerminationReason.TASK_COMPLETE

See examples/math_env.py for a complete example:

python examples/math_env.py --backend sglang --sglang-base-url http://localhost:30000

RL Training

For RL training with slime, customize the generate and reward_func methods to replace single generation with agentic rollout:

from strands_env.core import Action, TaskContext
from strands_env.core.models import sglang_model_factory
from strands_env.utils import get_cached_client_from_slime_args

async def generate(args, sample, sampling_params):
    # Build model factory with cached client
    factory = sglang_model_factory(
        model_id=args.hf_checkpoint,
        tokenizer=tokenizer,
        client=get_cached_client_from_slime_args(args),
        sampling_params=sampling_params,
    )

    # Create environment and run step
    env = YourEnv(model_factory=factory, reward_fn=None)
    action = Action(message=sample.prompt, task_context=TaskContext(ground_truth=sample.label))
    step_result = await env.step(action)

    # Extract TITO data for training
    token_obs = step_result.observation.tokens
    sample.tokens = token_obs.token_ids
    sample.loss_mask = token_obs.rollout_loss_mask
    sample.rollout_log_probs = token_obs.rollout_logprobs
    sample.response_length = len(token_obs.rollout_token_ids)

    # Attach for reward computation
    sample.action = action
    sample.step_result = step_result
    return sample

async def reward_func(args, sample, **kwargs):
    reward_fn = YourRewardFunction()
    reward_result = await reward_fn.compute(action=sample.action, step_result=sample.step_result)
    return reward_result.reward

Key points:

  • get_cached_client_from_slime_args(args) provides connection pooling across rollouts
  • TokenObservation contains token IDs and logprobs for on-policy training
  • Reward is computed separately to allow async/batched reward computation

Evaluation

The Evaluator orchestrates concurrent rollouts with checkpointing and pass@k metrics. It takes an async env_factory for flexible environment creation per sample, and subclasses implement load_dataset for different benchmarks:

...
from strands_env.eval import Evaluator

class YourEvaluator(Evaluator):
    benchmark_name = "YourBenchmark"

    def load_dataset(self) -> Iterable[Action]:
        ...

async def env_factory(action: Action) -> Environment:
    ...

evaluator = YourEvaluator(
    env_factory=env_factory,
    n_samples_per_prompt=8,
    max_concurrency=30,
    keep_tokens=False, # Set True if requiring token-level trajectories (SGLang only)
    metrics_fns=[...], # Define more metrics, pass@k has been included by default
)

actions = evaluator.load_dataset()
results = await evaluator.run(actions)
metrics = evaluator.compute_metrics(results)  # {"pass@1": 0.75, "pass@8": 0.95}

See examples/aime_eval.py for a complete example:

python examples/aime_eval.py --backend sglang --sglang-base-url http://localhost:30000

Development

# Lint
ruff check src/ && ruff format --check src/

# Unit tests
pytest tests/unit/ -v

# Integration tests (requires running SGLang server)
pytest tests/integration/ -v --sglang-base-url=http://localhost:30000

License

Apache License 2.0 — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

strands_env-0.1.1.tar.gz (40.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

strands_env-0.1.1-py3-none-any.whl (35.7 kB view details)

Uploaded Python 3

File details

Details for the file strands_env-0.1.1.tar.gz.

File metadata

  • Download URL: strands_env-0.1.1.tar.gz
  • Upload date:
  • Size: 40.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for strands_env-0.1.1.tar.gz
Algorithm Hash digest
SHA256 84595038cd1afbd1084bb892dbfa91a94313e8febe63b509c27d23701b439903
MD5 6677878a97239415141e93b82e7203b6
BLAKE2b-256 238664caaf9794fb70399ffc64489775b319a123916ae86a69c732287eec4ec2

See more details on using hashes here.

Provenance

The following attestation bundles were made for strands_env-0.1.1.tar.gz:

Publisher: publish.yml on horizon-rl/strands-env

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file strands_env-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: strands_env-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 35.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for strands_env-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 4518e0d34527a79e286f943a628c74184f42095f8cb0256d3224caedc16b22af
MD5 ef12fd235c3c51599695322e18293bcf
BLAKE2b-256 3679520f24478713c86a8f84dcfd9f9fd05e08fd52f81bf2b3fb6d594c2fd7bc

See more details on using hashes here.

Provenance

The following attestation bundles were made for strands_env-0.1.1-py3-none-any.whl:

Publisher: publish.yml on horizon-rl/strands-env

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page