Lightweight model gateway for capturing LLM call traces during RL agent training
Project description
rllm-model-gateway
Lightweight model gateway for capturing LLM call traces during RL agent training. Sits between agents and inference servers (vLLM), transparently recording token IDs, logprobs, and conversation data — with zero modifications to agent code.
Quick Start
# Create a uv environment
uv venv --python 3.11
source .venv/bin/activate
# Install
uv pip install -e .
# Set up pre-commit hooks (one-time, from the rllm repo root)
cd .. && pre-commit install && cd rllm-model-gateway
# Start with a vLLM worker
rllm-model-gateway --port 9090 --worker http://localhost:8000/v1
# Or with a config file
rllm-model-gateway --config gateway.yaml
Agent Side (Zero rLLM Dependencies)
from openai import OpenAI
client = OpenAI(
base_url=f"http://localhost:9090/sessions/{session_id}/v1",
api_key="EMPTY",
)
response = client.chat.completions.create(
model="Qwen/Qwen2.5-7B",
messages=[{"role": "user", "content": "Hello"}],
)
Works with any OpenAI-compatible agent framework (ADK, Strands, LangChain, OpenAI Agents SDK, etc.).
Training Side
from rllm_model_gateway import GatewayClient
client = GatewayClient("http://localhost:9090")
# Create session and get URL for the agent
session_id = client.create_session()
agent_url = client.get_session_url(session_id)
# → "http://localhost:9090/sessions/{session_id}/v1"
# After agent runs, retrieve traces with full token data
traces = client.get_session_traces(session_id)
for trace in traces:
print(trace.prompt_token_ids) # From vLLM's return_token_ids
print(trace.completion_token_ids) # Per-token IDs, no retokenization needed
print(trace.logprobs) # Per-token logprobs
Features
- Zero agent coupling — Agents use standard
OpenAI(base_url=...), no rLLM imports - Zero retokenization — Token IDs captured directly from vLLM responses
- Partial rollout recovery — Traces persisted per-call, survive agent crashes
- Session-sticky routing — Multi-turn sessions routed to the same worker for prefix caching
- Streaming support — SSE streaming with real-time chunk forwarding and trace assembly
- Pluggable storage — SQLite (default), in-memory (testing), extensible to DynamoDB/PostgreSQL
- Lightweight — 6 dependencies, no torch/ray/verl/transformers
Development
uv venv --python 3.11
source .venv/bin/activate
uv pip install -e ".[dev]"
# Unit tests
python -m pytest tests/unit/ -x -q
# Integration tests (requires vLLM on localhost:4000, auto-skipped otherwise)
python -m pytest tests/integration/ -x -v
Configuration
CLI
rllm-model-gateway \
--port 9090 \
--db-path ./traces.db \
--worker http://vllm-0:8000/v1 \
--worker http://vllm-1:8000/v1
YAML (--config gateway.yaml)
host: "0.0.0.0"
port: 9090
db_path: "~/.rllm/gateway.db"
workers:
- url: "http://vllm-0:8000/v1"
model_name: "Qwen/Qwen2.5-7B-Instruct"
- url: "http://vllm-1:8000/v1"
model_name: "Qwen/Qwen2.5-7B-Instruct"
Environment Variables
RLLM_GATEWAY_HOST, RLLM_GATEWAY_PORT, RLLM_GATEWAY_DB_PATH, RLLM_GATEWAY_LOG_LEVEL, RLLM_GATEWAY_STORE
Embedded Usage
from rllm_model_gateway import create_app, GatewayConfig
config = GatewayConfig(port=9090, workers=[...])
app = create_app(config)
import threading, uvicorn
threading.Thread(target=uvicorn.run, args=(app,), kwargs={"port": 9090}, daemon=True).start()
Dynamic Worker Registration
Workers can be added at runtime via the admin API — useful for verl integration where vLLM addresses are only known after initialization:
client = GatewayClient("http://localhost:9090")
client.add_worker(url="http://vllm-worker-0:8000/v1", model_name="Qwen/Qwen2.5-7B")
API Overview
| Endpoint | Description |
|---|---|
POST /sessions/{sid}/v1/chat/completions |
Proxy (agent-facing, OpenAI-compatible) |
POST /sessions |
Create session with metadata |
GET /sessions/{sid}/traces |
Retrieve traces for a session |
POST /admin/workers |
Register a worker |
GET /health |
Gateway health check |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rllm_model_gateway-0.1.0.tar.gz.
File metadata
- Download URL: rllm_model_gateway-0.1.0.tar.gz
- Upload date:
- Size: 40.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
11be2368ca9c1b81ce2639d6451ab9054e90a55f5c46e70971d4cd6d7a335612
|
|
| MD5 |
46499e73681f03ceb3c112810a0dd848
|
|
| BLAKE2b-256 |
5b456134fa839037a425a54c6e63abeb22afbd8902564eca3890d430fb87f87a
|
File details
Details for the file rllm_model_gateway-0.1.0-py3-none-any.whl.
File metadata
- Download URL: rllm_model_gateway-0.1.0-py3-none-any.whl
- Upload date:
- Size: 27.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3963661cac8f29e803a725f13abfe719f3e1b94bf2885b17ffc009eafb92e562
|
|
| MD5 |
554cebc0ab6de858cc3e6e54f712763c
|
|
| BLAKE2b-256 |
39734c88d3b6f369c6643d5efa6d80e3a69099d5cba00b7a82d7f0b2ed30cb78
|