Lightweight model gateway for capturing LLM call traces during RL agent training

These details have not been verified by PyPI

Project description

rllm-model-gateway

Lightweight model gateway for capturing LLM call traces during RL agent training. Sits between agents and inference servers (vLLM), transparently recording token IDs, logprobs, and conversation data — with zero modifications to agent code.

Quick Start

# Create a uv environment
uv venv --python 3.11
source .venv/bin/activate

# Install
uv pip install -e .

# Set up pre-commit hooks (one-time, from the rllm repo root)
cd .. && pre-commit install && cd rllm-model-gateway

# Start with a vLLM worker
rllm-model-gateway --port 9090 --worker http://localhost:8000/v1

# Or with a config file
rllm-model-gateway --config gateway.yaml

Agent Side (Zero rLLM Dependencies)

from openai import OpenAI

client = OpenAI(
    base_url=f"http://localhost:9090/sessions/{session_id}/v1",
    api_key="EMPTY",
)
response = client.chat.completions.create(
    model="Qwen/Qwen2.5-7B",
    messages=[{"role": "user", "content": "Hello"}],
)

Works with any OpenAI-compatible agent framework (ADK, Strands, LangChain, OpenAI Agents SDK, etc.).

Training Side

from rllm_model_gateway import GatewayClient

client = GatewayClient("http://localhost:9090")

# Create session and get URL for the agent
session_id = client.create_session()
agent_url = client.get_session_url(session_id)
# → "http://localhost:9090/sessions/{session_id}/v1"

# After agent runs, retrieve traces with full token data
traces = client.get_session_traces(session_id)
for trace in traces:
    print(trace.prompt_token_ids)       # From vLLM's return_token_ids
    print(trace.completion_token_ids)   # Per-token IDs, no retokenization needed
    print(trace.logprobs)               # Per-token logprobs

Features

Zero agent coupling — Agents use standard OpenAI(base_url=...), no rLLM imports
Zero retokenization — Token IDs captured directly from vLLM responses
Partial rollout recovery — Traces persisted per-call, survive agent crashes
Session-sticky routing — Multi-turn sessions routed to the same worker for prefix caching
Streaming support — SSE streaming with real-time chunk forwarding and trace assembly
Pluggable storage — SQLite (default), in-memory (testing), extensible to DynamoDB/PostgreSQL
Lightweight — 6 dependencies, no torch/ray/verl/transformers

Development

uv venv --python 3.11
source .venv/bin/activate
uv pip install -e ".[dev]"

# Unit tests
python -m pytest tests/unit/ -x -q

# Integration tests (requires vLLM on localhost:4000, auto-skipped otherwise)
python -m pytest tests/integration/ -x -v

Configuration

CLI

rllm-model-gateway \
  --port 9090 \
  --db-path ./traces.db \
  --worker http://vllm-0:8000/v1 \
  --worker http://vllm-1:8000/v1

YAML (`--config gateway.yaml`)

host: "0.0.0.0"
port: 9090
db_path: "~/.rllm/gateway.db"

workers:
  - url: "http://vllm-0:8000/v1"
    model_name: "Qwen/Qwen2.5-7B-Instruct"
  - url: "http://vllm-1:8000/v1"
    model_name: "Qwen/Qwen2.5-7B-Instruct"

Environment Variables

RLLM_GATEWAY_HOST, RLLM_GATEWAY_PORT, RLLM_GATEWAY_DB_PATH, RLLM_GATEWAY_LOG_LEVEL, RLLM_GATEWAY_STORE

Embedded Usage

from rllm_model_gateway import create_app, GatewayConfig

config = GatewayConfig(port=9090, workers=[...])
app = create_app(config)

import threading, uvicorn
threading.Thread(target=uvicorn.run, args=(app,), kwargs={"port": 9090}, daemon=True).start()

Dynamic Worker Registration

Workers can be added at runtime via the admin API — useful for verl integration where vLLM addresses are only known after initialization:

client = GatewayClient("http://localhost:9090")
client.add_worker(url="http://vllm-worker-0:8000/v1", model_name="Qwen/Qwen2.5-7B")

API Overview

Endpoint	Description
`POST /sessions/{sid}/v1/chat/completions`	Proxy (agent-facing, OpenAI-compatible)
`POST /sessions`	Create session with metadata
`GET /sessions/{sid}/traces`	Retrieve traces for a session
`POST /admin/workers`	Register a worker
`GET /health`	Gateway health check

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Mar 12, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rllm_model_gateway-0.1.0.tar.gz (40.2 kB view details)

Uploaded Mar 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

rllm_model_gateway-0.1.0-py3-none-any.whl (27.0 kB view details)

Uploaded Mar 12, 2026 Python 3

File details

Details for the file rllm_model_gateway-0.1.0.tar.gz.

File metadata

Download URL: rllm_model_gateway-0.1.0.tar.gz
Upload date: Mar 12, 2026
Size: 40.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for rllm_model_gateway-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`11be2368ca9c1b81ce2639d6451ab9054e90a55f5c46e70971d4cd6d7a335612`
MD5	`46499e73681f03ceb3c112810a0dd848`
BLAKE2b-256	`5b456134fa839037a425a54c6e63abeb22afbd8902564eca3890d430fb87f87a`

See more details on using hashes here.

File details

Details for the file rllm_model_gateway-0.1.0-py3-none-any.whl.

File metadata

Download URL: rllm_model_gateway-0.1.0-py3-none-any.whl
Upload date: Mar 12, 2026
Size: 27.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for rllm_model_gateway-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3963661cac8f29e803a725f13abfe719f3e1b94bf2885b17ffc009eafb92e562`
MD5	`554cebc0ab6de858cc3e6e54f712763c`
BLAKE2b-256	`39734c88d3b6f369c6643d5efa6d80e3a69099d5cba00b7a82d7f0b2ed30cb78`

See more details on using hashes here.

rllm-model-gateway 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

rllm-model-gateway

Quick Start

Agent Side (Zero rLLM Dependencies)

Training Side

Features

Development

Configuration

CLI

YAML (`--config gateway.yaml`)

Environment Variables

Embedded Usage

Dynamic Worker Registration

API Overview

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

rllm-model-gateway 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

rllm-model-gateway

Quick Start

Agent Side (Zero rLLM Dependencies)

Training Side

Features

Development

Configuration

CLI

YAML (--config gateway.yaml)

Environment Variables

Embedded Usage

Dynamic Worker Registration

API Overview

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

YAML (`--config gateway.yaml`)