Skip to main content

A wrapper for interacting with the OpenRouter API

Project description

Rollouts

rollouts is python package for conveniently interacting with the OpenRouter API. The package provides three notable features:

  • You can generate multiple LLM responses ("rollouts") concurrently for the same prompt
  • The package will automatically cache responses. The first time you call client.generate('your prompt', n_samples=2), two jsons will be saved with the model response to each. If you make the same call, those jsons will be loaded. If you set use_cache="sql", caching/loading will be instead done using a SQLite database
  • You can easily insert text into a model's reasoning. If you call client.generate('What is 5*10?\n<think>\n5*1'), this will insert \n5*1' into the model's reasoning, which will continue with "0..."

Examples are provided below, and additional examples are shown in example.py.

Paper

This code is meant to help with implementing the chain-of-thought resampling techniques described in this paper:

Bogdan, P.C.*, Macar, U.*, Nanda, N.°, & Conmy, A.° (2025). Thought Anchors: Which LLM Reasoning Steps Matter?. arXiv preprint arXiv:2506.19143. PDF

Installation

pip install rollouts

Quick Start

# Set your API key
export OPENROUTER_API_KEY="your-key-here"

Synchronous Usage

Model responses are always via the chat-completions API.

from rollouts import RolloutsClient

# Create client with default settings
client = RolloutsClient(
    model="qwen/qwen3-30b-a3b",
    temperature=0.7,
    max_tokens=1000
) 

# Generate multiple responses (one prompt sampled concurrently). This runs on seeds from 0 to n_samples (e.g., 0, 1, 2, 3, 4)
rollouts = client.generate("What is the meaning of life?", n_samples=5)

# Access responses
for response in rollouts:
    print(f"Reasoning: {response.reasoning=}") # reasoning text if reasoning model; None if non-reasoning model
    print(f"Content: {response.content=}") # post-reasoning output (or just output if not a reasoning model)
    print(f"Response: {response.full=}") # "{reasoning}</think>{content}" if reasoning exists and completed; "{reasoning}" if reasoning not completed; "{content}" if non-reasoning model or if reasoning is hidden

Asynchronous Usage

import asyncio
from rollouts import RolloutsClient

async def main():
    client = RolloutsClient(model="qwen/qwen3-30b-a3b")
    
    # Generate responses for multiple prompts concurrently
    results = await asyncio.gather(
        client.agenerate("Explain quantum computing", n_samples=3),
        client.agenerate("Write a haiku", n_samples=5, temperature=1.2)
    )
    
    for rollouts in results:
        print(f"Generated {len(rollouts)} responses")

asyncio.run(main())

Thinking Injection

For models using tags, you can insert thoughts and continue the chain-of-thought from there (this works for Deepseek, Qwen, QwQ, Anthropic, and presumably other models).

prompt = "Calculate 10*5\n<think>\nLet me calculate: 10*5="
result = client.generate(prompt, n_samples=1)
# Model continues from "=" ("50" would be the next two tokens)

I believe "<think>" is normally surrounded by "\n" for chat completions by default. You probably should do this.

Importantly, you should avoid ending inserted thoughts with a trailing space (" "). Doing so will often cause tokenization issues, as most models tokenize words with a space prefix (e.g., " Hello"). When you insert thoughts with a trailing space, a model would need to introduce a double-space typo to continue with a word. Models hate typos and will thus be strongly biased toward continuing with tokens that don't have a space prefix (e.g., "0").

Inserting thoughts does not work for:

  • Models where true thinking tokens are hidden (Gemini and OpenAI)
  • GPT-OSS-20b/120b, which use a different reasoning template; I tried to get the GPT-OSS template working, but I'm not sure it's possible with OpenRouter

Parameter Override

The default OpenRouter settings are used, but you can override these either when defining the client or when generating responses. The logprobs parameter is not supported here; from what I can tell, it is unreliable on OpenRouter

client = RolloutsClient(model="qwen/qwen3-30b-a3b", temperature=0.7)

# Override temperature for this specific generation
rollouts = client.generate(
    "Be creative!",
    n_samples=5,
    temperature=1.5,
    max_tokens=2000,
    use_cache="sql", # Default = "json"
    requests_per_minute=200 # Default = None; no limit
)

result = client.generate(prompt, top_p=0.99)

Progress Bar

A progress bar automatically appears when generating multiple responses (n_samples > 1):

client = RolloutsClient(
    model="qwen/qwen3-30b-a3b",
    progress_bar=True  # Default, can be disabled
)

# Shows a progress bar for multiple samples
rollouts = client.generate("Write a story", n_samples=5)

# No progress bar for single sample (even if enabled)
rollout = client.generate("Quick answer", n_samples=1)

# Disable progress bar for a specific request
rollouts = client.generate("Silent generation", n_samples=10, progress_bar=False)

The progress bar:

  • Only appears when n_samples > 1
  • Shows the number of responses being generated
  • Automatically disappears when complete
  • Can be disabled globally (in client init) or per-request

Caching

Responses are automatically cached to disk:

client = RolloutsClient(
    model="qwen/qwen3-30b-a3b",
    use_cache=True,  # Default
    cache_dir="my_cache"  # Custom cache directory
)

# First call: generates responses
rollouts1 = client.generate("What is 2+2?", n_samples=3)

# Second call: uses cached responses (instant)
rollouts2 = client.generate("What is 2+2?", n_samples=3)

Cache Behavior:

  • Responses are cached in a hierarchical directory structure: .rollouts/model/parameters/prompt_hash_prefix/prompt_hash/seed_00000.json
  • Each unique combination of prompt, model, and parameters gets its own cache location
  • The prompt hash is split across two directory levels (prompt_hash_prefix/prompt_hash) as this helps performance when you have responses saved for >100k prompts. prompt_hash_prefix is just the first three hex digits of the prompt hash
  • If a cached response has finish_reason="error", it will not be loaded and is instead regenerated on the next request
  • To clear the cache, simply delete the cache directory or specific subdirectories/files

API Key Configuration

There are three ways to provide API keys:

1. Environment Variable

export OPENROUTER_API_KEY="your-key-here"

2. Pass to Client (recommended for production)

client = RolloutsClient(
    model="qwen/qwen3-30b-a3b",
    api_key="your-key-here"
)

3. Pass at Generation Time (for per-request keys)

client = RolloutsClient(model="qwen/qwen3-30b-a3b")
responses = client.generate(
    "Your prompt",
    n_samples=5,
    api_key="different-key-here"  # Overrides any default
)

Additional Notes

Progress Bar

A progress bar appears when generating multiple responses (n_samples > 1). You can disable it by setting progress_bar=False either when creating the client or for individual requests.

Rate Limiting

You can limit the requests per minute when defining your client using the requests_per_minute parameter (token bucket rate limiter):

client = RolloutsClient(
    model="qwen/qwen3-30b-a3b",
    requests_per_minute=60  # Limit to 60 requests per minute
)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rollouts-0.1.11.tar.gz (41.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rollouts-0.1.11-py3-none-any.whl (29.4 kB view details)

Uploaded Python 3

File details

Details for the file rollouts-0.1.11.tar.gz.

File metadata

  • Download URL: rollouts-0.1.11.tar.gz
  • Upload date:
  • Size: 41.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.4

File hashes

Hashes for rollouts-0.1.11.tar.gz
Algorithm Hash digest
SHA256 0dfa87eb791622d62fb7ddb0e27f9a1d6dce8a0515b301710a1b27d400712f44
MD5 220596cd0ff7a4fe3da599686701d6cd
BLAKE2b-256 a72956116e8252c0b43e87115697438a138593adb28df2bc1a2249db8eae88c9

See more details on using hashes here.

File details

Details for the file rollouts-0.1.11-py3-none-any.whl.

File metadata

  • Download URL: rollouts-0.1.11-py3-none-any.whl
  • Upload date:
  • Size: 29.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.4

File hashes

Hashes for rollouts-0.1.11-py3-none-any.whl
Algorithm Hash digest
SHA256 0678ba345b5f5eaf89094da6a03995a333ca3aad20078edaf28b2d013f706645
MD5 4155a4841c6df28fcf862820414fa816
BLAKE2b-256 fb0adf96496ebd002892a08732c1f784e45e7178c40056f58d71590b29b48cb9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page