Skip to main content

SGLang model provider for Strands Agents SDK with Token-In/Token-Out (TITO) support for agentic RL training.

Project description

strands-sglang

SGLang model provider for Strands Agents SDK with Token-In/Token-Out (TITO) support for agentic RL training.

Features

  • SGLang Native API: Uses SGLang's native /generate endpoint for efficient token-level generation
  • TITO Support: Tracks complete token trajectories with logprobs for RL training - no retokenization drift (see examples/retokenization_drift/)
  • Tool Call Parsing: Customizable tool parsing aligned with model chat templates (Hermes/Qwen format)
  • Iteration Limiting: Built-in hook to limit tool iterations with clean trajectory truncation

Requirements

  • Python 3.10+
  • Strands Agents SDK 1.7.0+
  • SGLang server running with your model
  • HuggingFace tokenizer for the model

Installation

pip install strands-agents strands-sglang

Or install from source with development dependencies:

git clone https://github.com/anthropics/strands-sglang.git
cd strands-sglang
pip install -e ".[dev]"

Quick Start

1. Start SGLang Server

python -m sglang.launch_server \
    --model-path Qwen/Qwen3-4B-Thinking-2507 \
    --port 8000 \
    --host 0.0.0.0

Tips: There's no need to load SGLang's tool parser because this is for training

2. Basic Agent Usage

import asyncio
from transformers import AutoTokenizer
from strands import Agent
from strands_tools import calculator
from strands_sglang import SGLangModel

async def main():
    # Initialize model with tokenizer
    tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-4B-Thinking-2507")
    model = SGLangModel(
        tokenizer=tokenizer,
        base_url="http://localhost:8000",
        model_id="Qwen/Qwen3-4B-Thinking-2507",
    )

    # Create agent with tools
    agent = Agent(
        model=model,
        tools=[calculator],
        system_prompt="You are a helpful math assistant. Use the calculator for all arithmetic.",
    )

    # Run episode
    model.reset()  # Reset TITO state for new episode
    result = await agent.invoke_async("What is 25 * 17?")
    print(result)

    # Access TITO data for RL training
    print(f"Trajectory: {len(model.token_manager)} tokens")
    print(f"Output tokens: {sum(model.token_manager.loss_mask)}")

asyncio.run(main())

RL Training with Slime

For RL training with Slime, run async rollout:

async def generate(args, sample: Sample, sampling_params) -> Sample:
    ...
    # The whole agent loop logic in a few lines
    url = f"http://{args.sglang_router_ip}:{args.sglang_router_port}/generate"
    model = SGLangModel(tokenizer=tokenizer, base_url=url)
    limiter = ToolIterationLimiter(max_iterations=5)  # Optional: control maximum tool iteration
    agent = Agent(model=model, tools=[calculator], hooks[limiter], system_prompt="...")
    try:
        await agent.invoke_async(sample.prompt)
        sample.status = Sample.Status.COMPLETED
    except Exception as e:
        # Use exception to determine TRUNCATED or ABORTED
        ...
    # Use model.token_manager to fill in sample's attributes
    sample.tokens = model.token_manager.token_ids
    sample.loss_mask = model.token_manager.loss_mask
    sample.rollout_log_probs = model.token_manager.logprobs
    ...

A concrete example at Slime's repository will be available later.

Configuration

SGLangModel Options

model = SGLangModel(
    tokenizer=tokenizer,           # Required: HuggingFace tokenizer
    base_url="http://localhost:8000",  # SGLang server URL
    model_id="Qwen/Qwen3-4B-Thinking-2507",  # Optional: model identifier
    tool_call_parser=HermesToolCallParser(),  # Tool call format parser
    params={                        # Sampling parameters
        "max_new_tokens": 1024,
        "temperature": 0.7,
        "top_p": 0.9,
    },
    timeout=300.0,                  # Request timeout in seconds
    return_logprobs=True,           # Return logprobs (default: True)
)

See more sampling params options at SGLang's documentation.

Testing

Unit Tests

pytest tests/unit/ -v

Integration Tests

Requires a running SGLang server:

# Start server first
python -m sglang.launch_server --model-path Qwen/Qwen3-4B-Thinking-2507 --port 8000

# Run tests
pytest tests/integration/ -v \
    --sglang-base-url=http://localhost:8000 \
    --sglang-model-id=Qwen/Qwen3-4B-Thinking-2507

Or configure via environment variables:

export SGLANG_BASE_URL=http://localhost:8000
export SGLANG_MODEL_ID=Qwen/Qwen3-4B-Thinking-2507
pytest tests/integration/ -v

Contributing

pip install -e ".[dev]"
pre-commit install

Now git commit will auto-run linting and formatting checks.

License

Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

strands_sglang-0.0.1.tar.gz (45.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

strands_sglang-0.0.1-py3-none-any.whl (20.9 kB view details)

Uploaded Python 3

File details

Details for the file strands_sglang-0.0.1.tar.gz.

File metadata

  • Download URL: strands_sglang-0.0.1.tar.gz
  • Upload date:
  • Size: 45.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for strands_sglang-0.0.1.tar.gz
Algorithm Hash digest
SHA256 32e4fd505375368319ae12ed9e5e792738ae503455da6d1cf1a615be28a82df7
MD5 e799ee537aeb93c593ae5443574a3ec7
BLAKE2b-256 4124622314319d228f2f826193cf554b27e5b6bbc521cd57edd931536e8530ea

See more details on using hashes here.

Provenance

The following attestation bundles were made for strands_sglang-0.0.1.tar.gz:

Publisher: publish.yml on horizon-rl/strands-sglang

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file strands_sglang-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: strands_sglang-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 20.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for strands_sglang-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 4afb471424afc0a918a319e02950760dca9eb26af7cae4136bd1638181578bcb
MD5 563c516cb844bb8deffd0206dffcfc2b
BLAKE2b-256 7bbc9d45cb65a860eef5ed348589884efa8d9faac5d6362d1483ab7a1ae20882

See more details on using hashes here.

Provenance

The following attestation bundles were made for strands_sglang-0.0.1-py3-none-any.whl:

Publisher: publish.yml on horizon-rl/strands-sglang

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page