RL environments for Strands Agents — step, observe, reward.
Project description
strands-env
RL environment abstraction for Strands Agents — step, observe, reward.
Features
This package standardizes agent environments by treating each env.step() as a full agent loop, not a single model call or tool call. Built on strands agent loop and strands-sglang for RL training.
- Define environments easily — subclass
Environmentand implement tools as@toolfunctions - Capture token-level observations — token-in/token-out trajectories for on-policy RL training (SGLang backend)
- Plug in reward functions — evaluate agent outputs with custom
RewardFunction - Run benchmarks —
Evaluatorwith flexible environment setup, metric customization, and resume
An agent loop can be defined as
(prompt → (tool_call, tool_response+)* → response)
Install
pip install strands-env
For development:
git clone https://github.com/horizon-rl/strands-env.git && cd strands-env
pip install -e ".[dev]"
Usage
Define an Environment
Subclass Environment and add tools as @tool-decorated functions:
from strands import tool
from strands_env.core import Environment
@tool
def calculator(expression: str) -> str:
"""Evaluate a math expression."""
return str(eval(expression))
class MathEnv(Environment):
def get_tools(self):
return [calculator]
Run It
env = MathEnv(model_factory=factory, reward_fn=reward_fn)
result = await env.step(Action(message="What is 2^10?", task_context=TaskContext(ground_truth="1024")))
result.observation.final_response # "1024"
result.observation.tokens # TokenObservation (SGLang only)
result.reward.reward # 1.0
result.termination_reason # TerminationReason.TASK_COMPLETE
See examples/math_env.py for a complete example:
python examples/math_env.py --backend sglang --sglang-base-url http://localhost:30000
RL Training
For RL training with slime, customize the generate and reward_func methods to replace single generation with agentic rollout:
from strands_env.core import Action, TaskContext
from strands_env.core.models import sglang_model_factory
from strands_env.utils import get_cached_client_from_slime_args
async def generate(args, sample, sampling_params):
# Build model factory with cached client
factory = sglang_model_factory(
model_id=args.hf_checkpoint,
tokenizer=tokenizer,
client=get_cached_client_from_slime_args(args),
sampling_params=sampling_params,
)
# Create environment and run step
env = YourEnv(model_factory=factory, reward_fn=None)
action = Action(message=sample.prompt, task_context=TaskContext(ground_truth=sample.label))
step_result = await env.step(action)
# Extract TITO data for training
token_obs = step_result.observation.tokens
sample.tokens = token_obs.token_ids
sample.loss_mask = token_obs.rollout_loss_mask
sample.rollout_log_probs = token_obs.rollout_logprobs
sample.response_length = len(token_obs.rollout_token_ids)
# Attach for reward computation
sample.action = action
sample.step_result = step_result
return sample
async def reward_func(args, sample, **kwargs):
reward_fn = YourRewardFunction()
reward_result = await reward_fn.compute(action=sample.action, step_result=sample.step_result)
return reward_result.reward
Key points:
get_cached_client_from_slime_args(args)provides connection pooling across rolloutsTokenObservationcontains token IDs and logprobs for on-policy training- Reward is computed separately to allow async/batched reward computation
Evaluation
The Evaluator orchestrates concurrent rollouts with checkpointing and pass@k metrics. It takes an async env_factory for flexible environment creation per sample, and subclasses implement load_dataset for different benchmarks:
...
from strands_env.eval import Evaluator
class YourEvaluator(Evaluator):
benchmark_name = "YourBenchmark"
def load_dataset(self) -> Iterable[Action]:
...
async def env_factory(action: Action) -> Environment:
...
evaluator = YourEvaluator(
env_factory=env_factory,
n_samples_per_prompt=8,
max_concurrency=30,
keep_tokens=False, # Set True if requiring token-level trajectories (SGLang only)
metrics_fns=[...], # Define more metrics, pass@k has been included by default
)
actions = evaluator.load_dataset()
results = await evaluator.run(actions)
metrics = evaluator.compute_metrics(results) # {"pass@1": 0.75, "pass@8": 0.95}
See examples/aime_eval.py for a complete example:
python examples/aime_eval.py --backend sglang --sglang-base-url http://localhost:30000
Development
# Lint
ruff check src/ && ruff format --check src/
# Unit tests
pytest tests/unit/ -v
# Integration tests (requires running SGLang server)
pytest tests/integration/ -v --sglang-base-url=http://localhost:30000
License
Apache License 2.0 — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file strands_env-0.1.1.tar.gz.
File metadata
- Download URL: strands_env-0.1.1.tar.gz
- Upload date:
- Size: 40.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
84595038cd1afbd1084bb892dbfa91a94313e8febe63b509c27d23701b439903
|
|
| MD5 |
6677878a97239415141e93b82e7203b6
|
|
| BLAKE2b-256 |
238664caaf9794fb70399ffc64489775b319a123916ae86a69c732287eec4ec2
|
Provenance
The following attestation bundles were made for strands_env-0.1.1.tar.gz:
Publisher:
publish.yml on horizon-rl/strands-env
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
strands_env-0.1.1.tar.gz -
Subject digest:
84595038cd1afbd1084bb892dbfa91a94313e8febe63b509c27d23701b439903 - Sigstore transparency entry: 923977948
- Sigstore integration time:
-
Permalink:
horizon-rl/strands-env@95df06c7ca614af60c0e2e9113f5f076c53150b1 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/horizon-rl
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@95df06c7ca614af60c0e2e9113f5f076c53150b1 -
Trigger Event:
release
-
Statement type:
File details
Details for the file strands_env-0.1.1-py3-none-any.whl.
File metadata
- Download URL: strands_env-0.1.1-py3-none-any.whl
- Upload date:
- Size: 35.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4518e0d34527a79e286f943a628c74184f42095f8cb0256d3224caedc16b22af
|
|
| MD5 |
ef12fd235c3c51599695322e18293bcf
|
|
| BLAKE2b-256 |
3679520f24478713c86a8f84dcfd9f9fd05e08fd52f81bf2b3fb6d594c2fd7bc
|
Provenance
The following attestation bundles were made for strands_env-0.1.1-py3-none-any.whl:
Publisher:
publish.yml on horizon-rl/strands-env
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
strands_env-0.1.1-py3-none-any.whl -
Subject digest:
4518e0d34527a79e286f943a628c74184f42095f8cb0256d3224caedc16b22af - Sigstore transparency entry: 923977951
- Sigstore integration time:
-
Permalink:
horizon-rl/strands-env@95df06c7ca614af60c0e2e9113f5f076c53150b1 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/horizon-rl
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@95df06c7ca614af60c0e2e9113f5f076c53150b1 -
Trigger Event:
release
-
Statement type: