Community vLLM provider utilities for Strands Agents (OpenAI-compatible).

These details have not been verified by PyPI

Project links

Project description

strands-vllm

Community vLLM utilities for the Strands Agents SDK.

vLLM serves an OpenAI-compatible API, so most users can simply use OpenAIModel with base_url. This package provides small convenience helpers and (optionally) token-id/TITO-friendly defaults.

Credit / reference

This community package is inspired by the structure and example style of horizon-rl/strands-sglang.

Install

pip install strands-vllm

vLLM server notes (tools + token IDs)

Tools: if you want tool calling, your vLLM server must be started with tool-calling enabled and an appropriate chat template for your model (e.g., Llama 3.2 tool template).
Token IDs (TITO): return_token_ids=True requests vLLM token IDs; vLLM will include prompt_token_ids and streamed token_ids when supported.

Usage

Minimal: OpenAIModel pointed at vLLM

from strands import Agent
from strands.models.openai import OpenAIModel

model = OpenAIModel(
    client_args={"api_key": "EMPTY", "base_url": "http://localhost:8000/v1"},
    model_id="AMead10/Llama-3.2-3B-Instruct-AWQ",
)

agent = Agent(model=model)
print(agent("Hi"))

Convenience: `VLLMModel`

from strands import Agent
from strands_vllm import VLLMModel

model = VLLMModel(
    base_url="http://localhost:8000/v1",
    model_id="AMead10/Llama-3.2-3B-Instruct-AWQ",
    return_token_ids=True,
)

agent = Agent(model=model)
print(agent("Say hello"))

Tip: if you want to print only the final result (without streaming output being printed along the way), pass callback_handler=None:

agent = Agent(model=model, callback_handler=None)
print(agent("Say hello"))

Examples

All examples can be pointed at your server with:

export VLLM_BASE_URL="http://localhost:8000/v1"
export VLLM_MODEL_ID="AMead10/Llama-3.2-3B-Instruct-AWQ"

Tools (strands-agents-tools)

Install the optional tools package and run the example:

pip install strands-agents-tools
python examples/math_agent.py

Tool-call validation example

pip install strands-agents-tools
python examples/tool_validation_agent.py

RL rollout (TITO + loss_mask + retokenization check)

pip install "strands-vllm[drift]" strands-agents-tools
python examples/rl_rollout_tito.py

Tool-call validation (recommended with vLLM tool parsers)

vLLM tool calling can involve server-side post-processing, so it can be useful to guard tool execution:

from strands import Agent
from strands_tools.calculator import calculator
from strands_vllm import VLLMModel, VLLMToolValidationHooks

model = VLLMModel(base_url="http://localhost:8000/v1", model_id="...", return_token_ids=True)
agent = Agent(model=model, tools=[calculator], hooks=[VLLMToolValidationHooks()])
print(agent("Compute 17 * 19 using the calculator tool."))

Retokenization drift (educational)

This demo mirrors the idea from strands-sglang and shows why TITO matters: encode(decode(tokens)) != tokens can happen.

pip install "strands-vllm[drift]" strands-agents-tools
python examples/retokenization_drift.py

Token-in / token-out (TITO)

If your vLLM server includes token IDs in streaming responses, you can capture them using VLLMTokenRecorder (see examples/basic_agent.py).

Token IDs in OpenTelemetry spans (Agent Lightning)

VLLMTokenRecorder automatically adds token IDs as OpenTelemetry span attributes for Agent Lightning compatibility. When add_to_span=True (default), the following span attributes are set:

llm.token_count.prompt, llm.token_count.completion - Standard OpenTelemetry token counts
llm.hosted_vllm.prompt_token_ids, llm.hosted_vllm.response_token_ids - Token ID arrays

Reference: Agent Lightning blog post

python examples/span_token_ids.py

Development

Install from source:

git clone <your-fork-url>
cd strands-vllm
pip install -e ".[dev]"

License

Apache-2.0

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.0.6

Jan 23, 2026

0.0.5

Jan 17, 2026

This version

0.0.4

Jan 11, 2026

0.0.3

Jan 9, 2026

0.0.2

Jan 8, 2026

0.0.1.post1

Jan 8, 2026

0.0.1

Jan 8, 2026

0.0.1.dev0 pre-release

Jan 8, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

strands_vllm-0.0.4.tar.gz (207.3 kB view details)

Uploaded Jan 11, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

strands_vllm-0.0.4-py3-none-any.whl (14.1 kB view details)

Uploaded Jan 11, 2026 Python 3

File details

Details for the file strands_vllm-0.0.4.tar.gz.

File metadata

Download URL: strands_vllm-0.0.4.tar.gz
Upload date: Jan 11, 2026
Size: 207.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for strands_vllm-0.0.4.tar.gz
Algorithm	Hash digest
SHA256	`1e84c108bd70e6307df3e7b9a101f1e72cb5c0560f9abfb483c261d1fedfec3a`
MD5	`6c70e36cd5656d6f749ec0a43a5d109d`
BLAKE2b-256	`340db603184a7729b1b9b7c2945a1bbd4c4a7fcf73ad6c70397d8336a48fe1c4`

See more details on using hashes here.

File details

Details for the file strands_vllm-0.0.4-py3-none-any.whl.

File metadata

Download URL: strands_vllm-0.0.4-py3-none-any.whl
Upload date: Jan 11, 2026
Size: 14.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for strands_vllm-0.0.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`72d7c5cd7bcc6062ed7c20846b0eb1627a3f3f7fcc2a46fffe547297ac460ecd`
MD5	`8ad6f3838e56a9bb0f5a1ab8517f8c62`
BLAKE2b-256	`3d433a7c6cb27d1cb5a0fc0baed34a5a1eabb002b8946525492c6be9cc7a482e`

See more details on using hashes here.

strands-vllm 0.0.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

strands-vllm

Credit / reference

Install

vLLM server notes (tools + token IDs)

Usage

Minimal: OpenAIModel pointed at vLLM

Convenience: VLLMModel

Examples

Tools (strands-agents-tools)

Tool-call validation example

RL rollout (TITO + loss_mask + retokenization check)

Tool-call validation (recommended with vLLM tool parsers)

Retokenization drift (educational)

Token-in / token-out (TITO)

Token IDs in OpenTelemetry spans (Agent Lightning)

Development

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Convenience: `VLLMModel`