Community vLLM provider utilities for Strands Agents (OpenAI-compatible).
Project description
strands-vllm
Community vLLM utilities for the Strands Agents SDK.
vLLM serves an OpenAI-compatible API, so most users can simply use OpenAIModel with base_url.
This package provides small convenience helpers and (optionally) token-id/TITO-friendly defaults.
Credit / reference
This community package is inspired by the structure and example style of
horizon-rl/strands-sglang.
Install
pip install strands-vllm
vLLM server notes (tools + token IDs)
- Tools: if you want tool calling, your vLLM server must be started with tool-calling enabled and an appropriate chat template for your model (e.g., Llama 3.2 tool template).
- Token IDs (TITO):
return_token_ids=Truerequests vLLM token IDs; vLLM will includeprompt_token_idsand streamedtoken_idswhen supported.
Usage
Minimal: OpenAIModel pointed at vLLM
from strands import Agent
from strands.models.openai import OpenAIModel
model = OpenAIModel(
client_args={"api_key": "EMPTY", "base_url": "http://localhost:8000/v1"},
model_id="AMead10/Llama-3.2-3B-Instruct-AWQ",
)
agent = Agent(model=model)
print(agent("Hi"))
Convenience: VLLMModel
from strands import Agent
from strands_vllm import VLLMModel
model = VLLMModel(
base_url="http://localhost:8000/v1",
model_id="AMead10/Llama-3.2-3B-Instruct-AWQ",
return_token_ids=True,
)
agent = Agent(model=model)
print(agent("Say hello"))
Tip: if you want to print only the final result (without streaming output being printed along the way),
pass callback_handler=None:
agent = Agent(model=model, callback_handler=None)
print(agent("Say hello"))
Examples
All examples can be pointed at your server with:
export VLLM_BASE_URL="http://localhost:8000/v1"
export VLLM_MODEL_ID="AMead10/Llama-3.2-3B-Instruct-AWQ"
Tools (strands-agents-tools)
Install the optional tools package and run the example:
pip install strands-agents-tools
python examples/math_agent.py
Retokenization drift (educational)
This demo mirrors the idea from strands-sglang and shows why TITO matters:
encode(decode(tokens)) != tokens can happen.
pip install "strands-vllm[drift]" strands-agents-tools
python examples/retokenization_drift.py
Token-in / token-out (TITO)
If your vLLM server includes token IDs in streaming responses, you can capture them
using VLLMTokenRecorder (see examples/basic_agent.py).
Development
Install from source:
git clone <your-fork-url>
cd strands-vllm
pip install -e ".[dev]"
License
Apache-2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file strands_vllm-0.0.2.tar.gz.
File metadata
- Download URL: strands_vllm-0.0.2.tar.gz
- Upload date:
- Size: 202.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8a4814131beda355aa3c879007ad288c346fdb61ef67eedfb6ccaa0f81cef08c
|
|
| MD5 |
fb8bf1d898ff09d4df8a73dc1c640f3a
|
|
| BLAKE2b-256 |
d96c27a33aaa1e3fce3e73f593958a3a62b97e2c76bb51a5b3e2853c791f6ba7
|
File details
Details for the file strands_vllm-0.0.2-py3-none-any.whl.
File metadata
- Download URL: strands_vllm-0.0.2-py3-none-any.whl
- Upload date:
- Size: 11.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e6737bd8cf506b04c2e82d927e793b96ac1a673492ec7c8ef53dc195ff8f0684
|
|
| MD5 |
94c14daf6a8c9499a9c887a786601eb6
|
|
| BLAKE2b-256 |
90028aa8ffd81b6b3a671fa973b428579dc0b5a6679fbc6d187756a0406578fd
|