Skip to main content

A high-quality Python package for generating multiple LLM responses with built-in resampling, caching, and provider abstraction

Project description

Rollouts

A Python package for conveniently interacting with the OpenRouter API. The package provides two notable features:

  • The package will automatically cache responses. The first time you call client.generate('your prompt', n_samples=2), two jsons will be saved with the model response to each. If you make the same call, those jsons will be loaded.
  • You can easily insert text into model's reasoning. If you call client.generate('What is 5*10?\n<think>\n5*1') the model response will start "0".

Examples are provided below, and additional examples are shown in example.py.

Installation

pip install rollouts

Quick Start

# Set your API key
export OPENROUTER_API_KEY="your-key-here"

Synchronous Usage

from rollouts import RolloutsClient

# Create client with default settings
client = RolloutsClient(
    model="qwen/qwen3-30b-a3b",
    temperature=0.7,
    max_tokens=1000
) 

# Generate multiple responses (one prompt sampled concurrently). This runs on seeds from 0 to n_samples (e.g., 0, 1, 2, 3, 4)
rollouts = client.generate("What is the meaning of life?", n_samples=5)

# Access responses
for response in rollouts:
    print(f"Reasoning: {response.reasoning=}") # reasoning text if reasoning model; None if non-reasoning model or if reasoning is hidden
    print(f"Content: {response.content=}") # post-reasoning output (or just output if not a reasoning model)
    print(f"Response: {response.full=}") # "{reasoning}</think>{content}" if reasoning exists and completed; "{reasoning}" if reasoning not completed; "{content}" if non-reasoning model or if reasoning is hidden

Asynchronous Usage

import asyncio
from rollouts import RolloutsClient

async def main():
    client = RolloutsClient(model="qwen/qwen3-30b-a3b")
    
    # Generate responses for multiple prompts concurrently
    results = await asyncio.gather(
        client.agenerate("Explain quantum computing", n_samples=3),
        client.agenerate("Write a haiku", n_samples=5, temperature=1.2)
    )
    
    for rollouts in results:
        print(f"Generated {len(rollouts)} responses")

asyncio.run(main())

Thinking Injection

For models using tags, you can insert thoughts and continue the chain-of-thought from there (this works for Deepseek, Qwen, QwQ, Anthropic, and presumably other models).

Does not work for:

  • Models where thinking is hidden (Gemini and OpenAI)
  • GPT-OSS-20b/120b, which use a different reasoning template; I tried to get GPT-OSS working, but I'm not sure it's possible with OpenRouter.
prompt = "Calculate 10*5 <think>Let me calculate: 10*5="
result = client.generate(prompt, n_samples=1)
# Model continues from "=" ("50" would be the next two tokens)

Parameter Override

The default OpenRouter settings are used, but you can override these either when defining the client or when generating responses. The logprobs parameter is not supported here; from what I can tell, it is unreliable on OpenRouter

client = RolloutsClient(model="qwen/qwen3-30b-a3b", temperature=0.7)

# Override temperature for this specific generation
rollouts = client.generate(
    "Be creative!",
    n_samples=5,
    temperature=1.5,  # Override default
    max_tokens=2000   # Override default
)

result = client.generate(prompt, top_p=0.99)

Caching

Responses are automatically cached to disk:

client = RolloutsClient(
    model="qwen/qwen3-30b-a3b",
    use_cache=True,  # Default
    cache_dir="my_cache"  # Custom cache directory
)

# First call: generates responses
rollouts1 = client.generate("What is 2+2?", n_samples=3)

# Second call: uses cached responses (instant)
rollouts2 = client.generate("What is 2+2?", n_samples=3)

Cache Behavior:

  • Responses are cached in a hierarchical directory structure: cache_dir/model/parameters/prompt_hash_prefix/prompt_hash/seed_00000.json
  • Each unique combination of prompt, model, and parameters gets its own cache location
  • If a cached response has finish_reason="error", it will be regenerated on the next request
  • To clear the cache, simply delete the cache directory or specific subdirectories/files

API Reference

RolloutsClient

Main client class for generating responses.

Parameters:

  • model (str, required): Model identifier
  • temperature (float): Sampling temperature (0.0-2.0, default: 0.7)
  • top_p (float): Nucleus sampling parameter (0.0-1.0, default: 0.95)
  • max_tokens (int): Maximum tokens to generate (default: 4096)
  • top_k (int): Top-k sampling parameter (default: None)
  • presence_penalty (float): Presence penalty (-2.0 to 2.0, default: 0.0)
  • frequency_penalty (float): Frequency penalty (-2.0 to 2.0, default: 0.0)
  • provider (dict): Provider routing preferences (e.g., {"order": ["anthropic", "openai"]})
  • reasoning (dict): Reasoning configuration for models that support it (e.g., {"max_tokens": 2000} or {"effort": "low"})
  • include_reasoning (bool): Whether to include reasoning in response (default: None, auto-detected)
  • api_key (str): API key (uses OPENROUTER_API_KEY env variable if None)
  • max_retries (int): Maximum retry attempts for failed requests (default: 100)
  • timeout (int): Request timeout in seconds (default: 300)
  • verbose (bool): Print debug information (default: False)
  • use_cache (bool): Enable response caching (default: True)
  • cache_dir (str): Directory for cache files (default: ".rollouts")
  • requests_per_minute (int): Rate limit for API requests (default: None, no limit)
  • **kwargs: Additional OpenRouter parameters (e.g., min_p, top_a, repetition_penalty)

Rollouts

Container for multiple responses.

Attributes:

  • prompt: The input prompt
  • responses: List of Response objects
  • num_responses: Number of responses requested
  • temperature, top_p, max_tokens: Generation parameters
  • model: Model information

Methods:

  • get_texts(): Get all full response texts (includes reasoning + content)
  • get_reasonings(): Get reasoning portions only
  • get_contents(): Get content portions only (post-reasoning text)

Response

Individual response from the model.

Key Fields:

  • full: The complete response text, formatted as reasoning_text + "\n</think>\n" + content_text if a reasoning model; reasoning_text if reasoning not finished; content_text if not a reasoning model or reasoning is hidden
  • content: The post-reasoning text (what comes after </think>)
  • reasoning: The reasoning/thinking text (what comes before </think>)
  • usage: Token usage statistics
  • finish_reason: Why the response ended (e.g., "stop", "length")

OpenRouter

Low-level API provider class for direct OpenRouter API access.

Usage:

from rollouts import OpenRouter

# Initialize with API key
router = OpenRouter(api_key="your-key-here")

# Generate a single response (async)
async def generate():
    response = await router.generate_single(
        prompt="Hello, world!",
        config={"model": "qwen/qwen3-30b-a3b", "temperature": 0.7, "max_tokens": 100},
        seed=42
    )
    print(response.full)

Note: This is a lower-level interface. Most users should use RolloutsClient instead, which provides caching, retry logic, and a simpler API.

API Key Configuration

There are three ways to provide API keys:

1. Environment Variable

export OPENROUTER_API_KEY="your-key-here"

2. Pass to Client (recommended for production)

client = RolloutsClient(
    model="qwen/qwen3-30b-a3b",
    api_key="your-key-here"
)

3. Pass at Generation Time (for per-request keys)

client = RolloutsClient(model="qwen/qwen3-30b-a3b")
responses = client.generate(
    "Your prompt",
    n_samples=5,
    api_key="different-key-here"  # Overrides any default
)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rollouts-0.1.3.tar.gz (31.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rollouts-0.1.3-py3-none-any.whl (21.4 kB view details)

Uploaded Python 3

File details

Details for the file rollouts-0.1.3.tar.gz.

File metadata

  • Download URL: rollouts-0.1.3.tar.gz
  • Upload date:
  • Size: 31.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.4

File hashes

Hashes for rollouts-0.1.3.tar.gz
Algorithm Hash digest
SHA256 900a76d859e9db0dc7f400d02e575902a165d355eae4c9e3e6a79ac7ea7117d5
MD5 1b1fa15b7a6bd7e5299a10fe7dad8183
BLAKE2b-256 ca1cfb71b1ae2fc52dcd3f9208be491315274e5b20ab2f97d2db8b29c447257f

See more details on using hashes here.

File details

Details for the file rollouts-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: rollouts-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 21.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.4

File hashes

Hashes for rollouts-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 5501e5cf290c9ba5db19bf70c95698c6a3c24edd358223afa2558bfc1c42eb7b
MD5 dd5ca39cdd09faf6ce6a82375ea9e806
BLAKE2b-256 2adca7a4cd503b93fa33131e323521e7f9a89c6150647a2c08a91224b362daa5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page