Community vLLM provider utilities for Strands Agents (OpenAI-compatible).

These details have not been verified by PyPI

Project links

Project description

Strands-vLLM

Community vLLM provider for Strands Agents SDK with Token-In/Token-Out (TITO) support and Agent Lightning integration.

Features

This package provides convenient utilities for using vLLM with the Strands Agents SDK, designed for training-ready agent rollouts:

Token-In/Token-Out (TITO): capture token IDs directly from vLLM streaming responses (no retokenization drift)
Agent Lightning integration: automatic OpenTelemetry span attributes for token IDs
Tool calling support: validation hooks for vLLM's server-side tool call post-processing
OpenAI-compatible API: works with vLLM's OpenAI-compatible endpoint

Requirements

Python 3.10+
Strands Agents SDK
vLLM server running with your model

Installation

pip install strands-vllm

Or install from source with development dependencies:

git clone https://github.com/agents-community/strands-vllm.git
cd strands-vllm
pip install -e ".[dev]"

Quick Start

1. Start vLLM Server

vllm serve <MODEL_ID> \
    --port 8000 \
    --enable-auto-tool-choice \
    --tool-call-parser llama3_json

2. Basic Agent

from strands import Agent
from strands_vllm import VLLMModel

model = VLLMModel(
    base_url="http://localhost:8000/v1",
    model_id="AMead10/Llama-3.2-3B-Instruct-AWQ",
    return_token_ids=True,
)

agent = Agent(model=model)
result = agent("Say hello")
print(result)

3. Token IDs for RL Training

from strands import Agent
from strands.handlers.callback_handler import CompositeCallbackHandler, PrintingCallbackHandler
from strands_vllm import VLLMModel, VLLMTokenRecorder

model = VLLMModel(
    base_url="http://localhost:8000/v1",
    model_id="AMead10/Llama-3.2-3B-Instruct-AWQ",
    return_token_ids=True,
)

recorder = VLLMTokenRecorder()
printer = PrintingCallbackHandler(verbose_tool_use=False)
callback = CompositeCallbackHandler(printer, recorder)

agent = Agent(model=model, callback_handler=callback)
agent("What is 17 * 19?")

# Access TITO data for RL training
print(f"Prompt token IDs: {recorder.prompt_token_ids}")
print(f"Response token IDs: {recorder.token_ids}")

Note: VLLMTokenRecorder automatically adds token IDs as OpenTelemetry span attributes (llm.hosted_vllm.prompt_token_ids, llm.hosted_vllm.response_token_ids) for Agent Lightning compatibility.

Slime Training

For RL training with Slime, VLLMModel with VLLMTokenRecorder eliminates the retokenization step by capturing token IDs directly from vLLM streaming responses.

Note: This requires THUDM/slime to be installed (not the pip slime package):

pip install git+https://github.com/THUDM/slime.git

from strands import Agent, tool
from strands_vllm import VLLMModel, VLLMTokenRecorder, TokenManager
from slime.utils.types import Sample

SYSTEM_PROMPT = "..."
MAX_TOOL_ITERATIONS = ...  # e.g., 5

@tool
def execute_python_code(code: str):
    """Execute Python code and return the output."""
    ...

async def generate(args, sample: Sample, sampling_params) -> Sample:
    """Generate with TITO: tokens captured during generation, no retokenization."""
    assert not args.partial_rollout, "Partial rollout not supported."

    # Set up Agent with VLLMModel and VLLMTokenRecorder
    model = VLLMModel(
        base_url=args.vllm_base_url,
        model_id=args.hf_checkpoint.split("/")[-1],
        return_token_ids=True,
        params={k: sampling_params[k] for k in ["max_new_tokens", "temperature", "top_p"]},
    )
    recorder = VLLMTokenRecorder()
    agent = Agent(
        model=model,
        tools=[execute_python_code],
        callback_handler=recorder,
        system_prompt=SYSTEM_PROMPT,
    )

    # Run Agent Loop
    prompt = sample.prompt if isinstance(sample.prompt, str) else sample.prompt[0]["content"]
    try:
        await agent.invoke_async(prompt)
        sample.status = Sample.Status.COMPLETED
    except Exception as e:
        # Always use TRUNCATED instead of ABORTED because Slime doesn't properly
        # handle ABORTED samples in reward processing. See: https://github.com/THUDM/slime/issues/200
        sample.status = Sample.Status.TRUNCATED
        logger.warning(f"TRUNCATED: {type(e).__name__}: {e}")

    # TITO: extract trajectory from recorder and TokenManager
    tm = TokenManager()
    for entry in recorder.history:
        pti = entry.get("prompt_token_ids")
        ti = entry.get("token_ids")
        if pti:
            tm.add_prompt(pti)
        if ti:
            tm.add_response(ti)

    prompt_len = len(tm.segments[0])  # system + user are first segment
    sample.tokens = tm.token_ids
    sample.loss_mask = tm.loss_mask[prompt_len:]
    sample.rollout_log_probs = tm.logprobs[prompt_len:]
    sample.response_length = len(sample.tokens) - prompt_len

    # Extract response from agent messages (vLLM returns text directly, no tokenizer needed)
    response_text = ""
    for msg in reversed(agent.messages):
        if msg.get("role") == "assistant":
            content = msg.get("content", [])
            if isinstance(content, list):
                for block in content:
                    if isinstance(block, dict) and "text" in block:
                        response_text = block["text"]
                        break
            if response_text:
                break
    sample.response = response_text

    # Cleanup and return
    recorder.reset()
    agent.cleanup()
    return sample

Examples

All examples can be configured with environment variables:

export VLLM_BASE_URL="http://localhost:8000/v1"
export VLLM_MODEL_ID="AMead10/Llama-3.2-3B-Instruct-AWQ"

Math agent with tools

pip install strands-agents-tools
python examples/math_agent.py

Agent Lightning integration

Demonstrates token IDs in OpenTelemetry spans for Agent Lightning compatibility:

python examples/agent_lightning.py

Tool-call validation

vLLM tool calling can involve server-side post-processing. Use validation hooks to guard tool execution:

from strands import Agent
from strands_tools.calculator import calculator
from strands_vllm import VLLMModel, VLLMToolValidationHooks

model = VLLMModel(base_url="http://localhost:8000/v1", model_id="...", return_token_ids=True)
agent = Agent(model=model, tools=[calculator], hooks=[VLLMToolValidationHooks()])
print(agent("Compute 17 * 19 using the calculator tool."))

Retokenization drift (educational)

This demo shows why TITO matters: encode(decode(tokens)) != tokens can happen.

pip install "strands-vllm[drift]" strands-agents-tools
python examples/retokenization_drift.py

Testing

# Unit tests
uv run pytest tests/unit/ -v

# Integration tests (requires vLLM server)
export VLLM_BASE_URL="http://localhost:8000/v1"
export VLLM_MODEL_ID="AMead10/Llama-3.2-3B-Instruct-AWQ"
uv run pytest tests/integration/ -v

Integration tests include:

test_agent_math500.py - Agent tests with real MATH-500 problems and TITO consistency checks
test_slime_integration.py - Slime training pattern using Slime's Sample type (requires pip install git+https://github.com/THUDM/slime.git)

Contributing

Contributions welcome! Install pre-commit hooks for code style and commit message validation:

pip install -e ".[dev]"
pre-commit install -t pre-commit -t commit-msg

This project uses Conventional Commits. Commit messages must follow the format:

<type>(<scope>): <description>

# Examples:
feat(recorder): add Agent Lightning span attributes
fix(vllm): handle empty response from server
docs: update TITO usage examples

Allowed types: feat, fix, docs, style, refactor, perf, test, build, ci, chore, revert

Related Projects

strands-sglang - SGLang provider for Strands Agents SDK

License

Apache License 2.0 - see LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.0.6

Jan 23, 2026

0.0.5

Jan 17, 2026

0.0.4

Jan 11, 2026

0.0.3

Jan 9, 2026

0.0.2

Jan 8, 2026

0.0.1.post1

Jan 8, 2026

0.0.1

Jan 8, 2026

0.0.1.dev0 pre-release

Jan 8, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

strands_vllm-0.0.6.tar.gz (211.6 kB view details)

Uploaded Jan 23, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

strands_vllm-0.0.6-py3-none-any.whl (15.4 kB view details)

Uploaded Jan 23, 2026 Python 3

File details

Details for the file strands_vllm-0.0.6.tar.gz.

File metadata

Download URL: strands_vllm-0.0.6.tar.gz
Upload date: Jan 23, 2026
Size: 211.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for strands_vllm-0.0.6.tar.gz
Algorithm	Hash digest
SHA256	`d3c8c267524aad4d5a8b9607882589e149df8097c091c3f90f87ac0ed522c070`
MD5	`7c75e09d254e5af5c1536767f144e2e9`
BLAKE2b-256	`5c84280f11dfaceab5be4f8517af35412e9f07837ff8479131bc65db552aeb3d`

See more details on using hashes here.

File details

Details for the file strands_vllm-0.0.6-py3-none-any.whl.

File metadata

Download URL: strands_vllm-0.0.6-py3-none-any.whl
Upload date: Jan 23, 2026
Size: 15.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for strands_vllm-0.0.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d2d82a3de0e81a3270be0e40627387debec7ef876f435b7869184d80fbd9dce9`
MD5	`8c6401af60aa69f11b42cc6cb397c379`
BLAKE2b-256	`6a0d9116d3fb7ae5350072a84adafc53b13638b67daa12210a5a3b19a6d488ee`

See more details on using hashes here.

strands-vllm 0.0.6

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Strands-vLLM

Features

Requirements

Installation

Quick Start

1. Start vLLM Server

2. Basic Agent

3. Token IDs for RL Training

Slime Training

Examples

Math agent with tools

Agent Lightning integration

Tool-call validation

Retokenization drift (educational)

Testing

Contributing

Related Projects

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes