SGLang model provider for Strands Agents SDK with Token-In/Token-Out (TITO) support for agentic RL training.
Project description
strands-sglang
SGLang model provider for Strands Agents SDK with Token-In/Token-Out (TITO) support for agentic RL training.
Features
- SGLang Native API: Uses SGLang's native
/generateendpoint with non-streaming POST for optimal parallelism - TITO Support: Tracks complete token trajectories with logprobs for RL training - no retokenization drift
- Tool Call Parsing: Customizable tool parsing aligned with model chat templates (Hermes/Qwen format)
- Iteration Limiting: Built-in hook to limit tool iterations with clean trajectory truncation
- RL Training Optimized: Connection pooling, aggressive retry (60 attempts), and non-streaming design aligned with Slime's http_utils.py
Requirements
- Python 3.10+
- Strands Agents SDK 1.7.0+
- SGLang server running with your model
- HuggingFace tokenizer for the model
Installation
pip install strands-agents strands-sglang strands-agents-tools
Or install from source with development dependencies:
git clone https://github.com/anthropics/strands-sglang.git
cd strands-sglang
pip install -e ".[dev]"
Quick Start
1. Start SGLang Server
python -m sglang.launch_server \
--model-path Qwen/Qwen3-4B-Instruct-2507 \
--port 30000 \
--host 0.0.0.0
2. Basic Agent
import asyncio
from transformers import AutoTokenizer
from strands import Agent
from strands_tools import calculator
from strands_sglang import SGLangModel
async def main():
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-4B-Instruct-2507")
model = SGLangModel(tokenizer=tokenizer, base_url="http://localhost:30000")
agent = Agent(model=model, tools=[calculator])
model.reset() # Reset TITO state for new episode
result = await agent.invoke_async("What is 25 * 17?")
print(result)
# Access TITO data for RL training
print(f"Tokens: {model.token_manager.token_ids}")
print(f"Loss mask: {model.token_manager.loss_mask}")
print(f"Logprobs: {model.token_manager.logprobs}")
asyncio.run(main())
Slime Training
For RL training with Slime, SGLangModel with TITO eliminates the retokenization step in generate_with_strands.py (this example is not fully ready yet):
from strands import Agent
from strands_sglang import SGLangClient, SGLangModel, ToolIterationLimiter
from slime.utils.types import Sample
SYSTEM_PROMPT = "..." # Your system prompt
async def generate(args, sample: Sample, sampling_params) -> Sample:
...
state = GenerateState(args)
# Create client and model with TITO tracking
client = SGLangClient.from_slime_args(args)
model = SGLangModel(
tokenizer=state.tokenizer,
client=client,
params={"max_new_tokens": sampling_params["max_new_tokens"], ...},
)
agent = Agent(
model=model,
tools=[...], # Your tools
hooks=[ToolIterationLimiter(max_iterations=...)],
system_prompt=SYSTEM_PROMPT,
)
# Run agent
model.reset()
try:
await agent.invoke_async(sample.prompt)
sample.status = Sample.Status.COMPLETED
except Exception as e:
# Define what truncation exceptions look like
if _is_truncation_error(e):
sample.status = Sample.Status.TRUNCATED
else:
sample.status = Sample.Status.ABORTED
# TITO: No retokenization needed - tokens tracked during generation
prompt_len = len(model.token_manager.segments[0]) # system + user are first segment
sample.tokens = model.token_manager.token_ids
sample.loss_mask = model.token_manager.loss_mask[prompt_len:]
sample.rollout_log_probs = model.token_manager.logprobs[prompt_len:]
sample.response_length = len(sample.tokens) - prompt_len
sample.response = model.tokenizer.decode(sample.tokens[prompt_len:], skip_special_tokens=False)
...
return sample
Testing
# Unit tests
pytest tests/unit/ -v
# Integration tests (requires SGLang server)
pytest tests/integration/ -v --sglang-base-url=http://localhost:30000
Contributing
Contributions welcome! Install pre-commit hooks for code style and commit message validation:
pip install -e ".[dev]"
pre-commit install --hook-type pre-commit --hook-type commit-msg
This project uses Conventional Commits. Commit messages must follow the format:
<type>(<scope>): <description>
# Examples:
feat(client): add retry backoff configuration
fix(sglang): handle empty response from server
docs: update TITO usage examples
Allowed types: feat, fix, docs, style, refactor, perf, test, build, ci, chore, revert
License
Apache License 2.0 - see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file strands_sglang-0.0.2.tar.gz.
File metadata
- Download URL: strands_sglang-0.0.2.tar.gz
- Upload date:
- Size: 50.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d6b131d672033ca14136f7d079ec2c8afc592829fdbedb797914f80ff79f354c
|
|
| MD5 |
623432a367473cdd6a5bb8b0d0081a27
|
|
| BLAKE2b-256 |
37cb358e8f018833d2f6c36b2bf11adb3caedc944f979aa4420eda4c95b33386
|
Provenance
The following attestation bundles were made for strands_sglang-0.0.2.tar.gz:
Publisher:
publish.yml on horizon-rl/strands-sglang
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
strands_sglang-0.0.2.tar.gz -
Subject digest:
d6b131d672033ca14136f7d079ec2c8afc592829fdbedb797914f80ff79f354c - Sigstore transparency entry: 799112780
- Sigstore integration time:
-
Permalink:
horizon-rl/strands-sglang@bd1e59adaefe5e6e05eb8fc2f34c85d30350dc5b -
Branch / Tag:
refs/tags/v0.0.2 - Owner: https://github.com/horizon-rl
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@bd1e59adaefe5e6e05eb8fc2f34c85d30350dc5b -
Trigger Event:
release
-
Statement type:
File details
Details for the file strands_sglang-0.0.2-py3-none-any.whl.
File metadata
- Download URL: strands_sglang-0.0.2-py3-none-any.whl
- Upload date:
- Size: 24.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c53cfc48540fd177a326b225d63f98ce93d1bcb3f0a1b2dc67791b9d60aa9ae2
|
|
| MD5 |
5417850651c1f95a925868bb7031b244
|
|
| BLAKE2b-256 |
1756addabd22303359b281b7bd26d0b3c652a39e3fd09bb37df7340adc9d5d0d
|
Provenance
The following attestation bundles were made for strands_sglang-0.0.2-py3-none-any.whl:
Publisher:
publish.yml on horizon-rl/strands-sglang
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
strands_sglang-0.0.2-py3-none-any.whl -
Subject digest:
c53cfc48540fd177a326b225d63f98ce93d1bcb3f0a1b2dc67791b9d60aa9ae2 - Sigstore transparency entry: 799112781
- Sigstore integration time:
-
Permalink:
horizon-rl/strands-sglang@bd1e59adaefe5e6e05eb8fc2f34c85d30350dc5b -
Branch / Tag:
refs/tags/v0.0.2 - Owner: https://github.com/horizon-rl
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@bd1e59adaefe5e6e05eb8fc2f34c85d30350dc5b -
Trigger Event:
release
-
Statement type: