Skip to main content

Community vLLM provider utilities for Strands Agents (OpenAI-compatible).

Project description

strands-vllm

Community vLLM utilities for the Strands Agents SDK.

vLLM serves an OpenAI-compatible API, so most users can simply use OpenAIModel with base_url. This package provides small convenience helpers and (optionally) token-id/TITO-friendly defaults.

Credit / reference

This community package is inspired by the structure and example style of horizon-rl/strands-sglang.

Install

pip install strands-vllm

vLLM server notes (tools + token IDs)

  • Tools: if you want tool calling, your vLLM server must be started with tool-calling enabled and an appropriate chat template for your model (e.g., Llama 3.2 tool template).
  • Token IDs (TITO): return_token_ids=True requests vLLM token IDs; vLLM will include prompt_token_ids and streamed token_ids when supported.

Usage

Minimal: OpenAIModel pointed at vLLM

from strands import Agent
from strands.models.openai import OpenAIModel

model = OpenAIModel(
    client_args={"api_key": "EMPTY", "base_url": "http://localhost:8000/v1"},
    model_id="AMead10/Llama-3.2-3B-Instruct-AWQ",
)

agent = Agent(model=model)
print(agent("Hi"))

Convenience: VLLMModel

from strands import Agent
from strands_vllm import VLLMModel

model = VLLMModel(
    base_url="http://localhost:8000/v1",
    model_id="AMead10/Llama-3.2-3B-Instruct-AWQ",
    return_token_ids=True,
)

agent = Agent(model=model)
print(agent("Say hello"))

Tip: if you want to print only the final result (without streaming output being printed along the way), pass callback_handler=None:

agent = Agent(model=model, callback_handler=None)
print(agent("Say hello"))

Examples

All examples can be pointed at your server with:

export VLLM_BASE_URL="http://localhost:8000/v1"
export VLLM_MODEL_ID="AMead10/Llama-3.2-3B-Instruct-AWQ"

Tools (strands-agents-tools)

Install the optional tools package and run the example:

pip install strands-agents-tools
python examples/math_agent.py

Tool-call validation example

pip install strands-agents-tools
python examples/tool_validation_agent.py

RL rollout (TITO + loss_mask + retokenization check)

pip install "strands-vllm[drift]" strands-agents-tools
python examples/rl_rollout_tito.py

Tool-call validation (recommended with vLLM tool parsers)

vLLM tool calling can involve server-side post-processing, so it can be useful to guard tool execution:

from strands import Agent
from strands_tools.calculator import calculator
from strands_vllm import VLLMModel, VLLMToolValidationHooks

model = VLLMModel(base_url="http://localhost:8000/v1", model_id="...", return_token_ids=True)
agent = Agent(model=model, tools=[calculator], hooks=[VLLMToolValidationHooks()])
print(agent("Compute 17 * 19 using the calculator tool."))

Retokenization drift (educational)

This demo mirrors the idea from strands-sglang and shows why TITO matters: encode(decode(tokens)) != tokens can happen.

pip install "strands-vllm[drift]" strands-agents-tools
python examples/retokenization_drift.py

Token-in / token-out (TITO)

If your vLLM server includes token IDs in streaming responses, you can capture them using VLLMTokenRecorder (see examples/basic_agent.py).

Development

Install from source:

git clone <your-fork-url>
cd strands-vllm
pip install -e ".[dev]"

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

strands_vllm-0.0.3.tar.gz (205.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

strands_vllm-0.0.3-py3-none-any.whl (13.2 kB view details)

Uploaded Python 3

File details

Details for the file strands_vllm-0.0.3.tar.gz.

File metadata

  • Download URL: strands_vllm-0.0.3.tar.gz
  • Upload date:
  • Size: 205.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for strands_vllm-0.0.3.tar.gz
Algorithm Hash digest
SHA256 7c2c72688f8fd3aa081270ed45f0c4a9f21e8419d14984154e2df49ac9312f7e
MD5 e49a95428141dea7ce70e05527f33ddb
BLAKE2b-256 4b508be92c8341fe8e8c364021997d83053527cc5c0ea4d18a88eb86fc7124f5

See more details on using hashes here.

File details

Details for the file strands_vllm-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: strands_vllm-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 13.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for strands_vllm-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 6d665a4747b694c3d994fe2c5ba689430dd920775049a89e1d14c8ac9d3c7086
MD5 24ce0ca658c528094d08456dfada7cc1
BLAKE2b-256 85fa568076f9ae2bb091ef68d0b5eccc1558dbeb72b86154c7af986098d672de

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page