RL environments for Strands Agents — step, observe, reward.

These details have not been verified by PyPI

Development Status
- 4 - Beta
Intended Audience
License
- OSI Approved :: Apache Software License
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Scientific/Engineering :: Artificial Intelligence

Project description

strands-env

RL environment abstraction for Strands Agents — step, observe, reward.

Features

This package standardizes agent environments by treating each env.step() as a full agent loop, not a single model call or tool call. Built on strands agent loop and strands-sglang for RL training.

Define environments easily — subclass Environment and implement tools as @tool functions
Capture token-level observations — token-in/token-out trajectories for on-policy RL training (SGLang backend)
Plug in reward functions — evaluate agent outputs with custom RewardFunction
Run benchmarks — Evaluator with flexible environment setup, metric customization, and resume

An agent loop can be defined as (prompt → (tool_call, tool_response+)* → response)

Install

pip install strands-env

For development:

git clone https://github.com/horizon-rl/strands-env.git && cd strands-env
pip install -e ".[dev]"

Usage

Define an Environment

Subclass Environment and add tools as @tool-decorated functions:

from strands import tool
from strands_env.core import Environment

@tool
def calculator(expression: str) -> str:
    """Evaluate a math expression."""
    return str(eval(expression))

class MathEnv(Environment):
    def get_tools(self):
        return [calculator]

Run It

env = MathEnv(model_factory=factory, reward_fn=reward_fn)
result = await env.step(Action(message="What is 2^10?", task_context=TaskContext(ground_truth="1024")))

result.observation.final_response   # "1024"
result.observation.tokens           # TokenObservation (SGLang only)
result.reward.reward                # 1.0
result.termination_reason           # TerminationReason.TASK_COMPLETE

See examples/math_env.py for a complete example:

python examples/math_env.py --backend sglang --sglang-base-url http://localhost:30000

RL Training

For RL training with slime, customize the generate and reward_func methods to replace single generation with agentic rollout:

from strands_env.core import Action, TaskContext
from strands_env.core.models import sglang_model_factory
from strands_env.utils import get_cached_client_from_slime_args

async def generate(args, sample, sampling_params):
    # Build model factory with cached client
    factory = sglang_model_factory(
        model_id=args.hf_checkpoint,
        tokenizer=tokenizer,
        client=get_cached_client_from_slime_args(args),
        sampling_params=sampling_params,
    )

    # Create environment and run step
    env = YourEnv(model_factory=factory, reward_fn=None)
    action = Action(message=sample.prompt, task_context=TaskContext(ground_truth=sample.label))
    step_result = await env.step(action)

    # Extract TITO data for training
    token_obs = step_result.observation.tokens
    sample.tokens = token_obs.token_ids
    sample.loss_mask = token_obs.rollout_loss_mask
    sample.rollout_log_probs = token_obs.rollout_logprobs
    sample.response_length = len(token_obs.rollout_token_ids)

    # Attach for reward computation
    sample.action = action
    sample.step_result = step_result
    return sample

async def reward_func(args, sample, **kwargs):
    reward_fn = YourRewardFunction()
    reward_result = await reward_fn.compute(action=sample.action, step_result=sample.step_result)
    return reward_result.reward

Key points:

get_cached_client_from_slime_args(args) provides connection pooling across rollouts
TokenObservation contains token IDs and logprobs for on-policy training
Reward is computed separately to allow async/batched reward computation

Evaluation

The Evaluator orchestrates concurrent rollouts with checkpointing and pass@k metrics. It takes an async env_factory for flexible environment creation per sample, and subclasses implement load_dataset for different benchmarks:

...
from strands_env.eval import Evaluator

class YourEvaluator(Evaluator):
    benchmark_name = "YourBenchmark"

    def load_dataset(self) -> Iterable[Action]:
        ...

async def env_factory(action: Action) -> Environment:
    ...

evaluator = YourEvaluator(
    env_factory=env_factory,
    n_samples_per_prompt=8,
    max_concurrency=30,
    keep_tokens=False, # Set True if requiring token-level trajectories (SGLang only)
    metrics_fns=[...], # Define more metrics, pass@k has been included by default
)

actions = evaluator.load_dataset()
results = await evaluator.run(actions)
metrics = evaluator.compute_metrics(results)  # {"pass@1": 0.75, "pass@8": 0.95}

See examples/aime_eval.py for a complete example:

python examples/aime_eval.py --backend sglang --sglang-base-url http://localhost:30000

Development

# Lint
ruff check src/ && ruff format --check src/

# Unit tests
pytest tests/unit/ -v

# Integration tests (requires running SGLang server)
pytest tests/integration/ -v --sglang-base-url=http://localhost:30000

License

Apache License 2.0 — see LICENSE.

Project details

These details have not been verified by PyPI

Development Status
- 4 - Beta
Intended Audience
License
- OSI Approved :: Apache Software License
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Scientific/Engineering :: Artificial Intelligence

Release history Release notifications | RSS feed

0.3.2

May 3, 2026

0.3.1

Apr 8, 2026

0.3.0

Mar 31, 2026

0.2.3

Mar 4, 2026

0.2.2

Mar 1, 2026

0.2.1

Feb 24, 2026

0.2.0

Feb 17, 2026

0.1.2

Feb 7, 2026

This version

0.1.1

Feb 6, 2026

0.1.0

Feb 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

strands_env-0.1.1.tar.gz (40.2 kB view details)

Uploaded Feb 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

strands_env-0.1.1-py3-none-any.whl (35.7 kB view details)

Uploaded Feb 6, 2026 Python 3

File details

Details for the file strands_env-0.1.1.tar.gz.

File metadata

Download URL: strands_env-0.1.1.tar.gz
Upload date: Feb 6, 2026
Size: 40.2 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for strands_env-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`84595038cd1afbd1084bb892dbfa91a94313e8febe63b509c27d23701b439903`
MD5	`6677878a97239415141e93b82e7203b6`
BLAKE2b-256	`238664caaf9794fb70399ffc64489775b319a123916ae86a69c732287eec4ec2`

See more details on using hashes here.

Provenance

The following attestation bundles were made for strands_env-0.1.1.tar.gz:

Publisher: publish.yml on horizon-rl/strands-env

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: strands_env-0.1.1.tar.gz
- Subject digest: 84595038cd1afbd1084bb892dbfa91a94313e8febe63b509c27d23701b439903
- Sigstore transparency entry: 923977948
- Sigstore integration time: Feb 6, 2026
Source repository:
- Permalink: horizon-rl/strands-env@95df06c7ca614af60c0e2e9113f5f076c53150b1
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/horizon-rl
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@95df06c7ca614af60c0e2e9113f5f076c53150b1
- Trigger Event: release

File details

Details for the file strands_env-0.1.1-py3-none-any.whl.

File metadata

Download URL: strands_env-0.1.1-py3-none-any.whl
Upload date: Feb 6, 2026
Size: 35.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for strands_env-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4518e0d34527a79e286f943a628c74184f42095f8cb0256d3224caedc16b22af`
MD5	`ef12fd235c3c51599695322e18293bcf`
BLAKE2b-256	`3679520f24478713c86a8f84dcfd9f9fd05e08fd52f81bf2b3fb6d594c2fd7bc`

See more details on using hashes here.

Provenance

The following attestation bundles were made for strands_env-0.1.1-py3-none-any.whl:

Publisher: publish.yml on horizon-rl/strands-env

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: strands_env-0.1.1-py3-none-any.whl
- Subject digest: 4518e0d34527a79e286f943a628c74184f42095f8cb0256d3224caedc16b22af
- Sigstore transparency entry: 923977951
- Sigstore integration time: Feb 6, 2026
Source repository:
- Permalink: horizon-rl/strands-env@95df06c7ca614af60c0e2e9113f5f076c53150b1
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/horizon-rl
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@95df06c7ca614af60c0e2e9113f5f076c53150b1
- Trigger Event: release

strands-env 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

strands-env

Features

Install

Usage

Define an Environment

Run It

RL Training

Evaluation

Development

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance