Skip to main content

Thin client and batch execution helpers for OpenAI Responses API workloads.

Project description

tokenrail

CI PyPI Python 3.10+ License: MIT

tokenrail is a small Python library for running OpenAI Responses API jobs with a client.responses-style surface.

It focuses on:

  • thread-based OpenAI batch execution
  • structured output parsing for Pydantic models
  • client-side RPM / TPM submit throttling
  • per-model token / cost monitoring with ETA progress reporting
  • resumable JSONL and per-request result writing

Fully typed (PEP 561), supports Python 3.10+.

Installation

uv add tokenrail
# or
pip install tokenrail

To track an unreleased revision instead, depend on the Git repository directly:

[tool.uv.sources]
tokenrail = { git = "https://github.com/takumi0shibata/tokenrail", tag = "v1.1.0" }

Set your OpenAI API key in the consuming project before use:

export OPENAI_API_KEY=...

Quick start

from tokenrail import BatchExecutor, ResultsJsonlSink, PerRequestJsonSink, RailClient, RollingMetricsMonitor
from tokenrail.executor import batch_items_from_queries

client = RailClient.openai(max_retries=6)

queries = {
    "1": [{"role": "user", "content": "Summarize this paper in 3 bullets."}],
    "2": [{"role": "user", "content": "Extract the key assumptions."}],
}

items = batch_items_from_queries(
    queries,
    model="gpt-5.4-mini-2026-03-17",
    reasoning_effort="medium",
    verbosity="low",
)

# Consolidate only the necessary elements from all processing results into a single file.
result_sink = ResultsJsonlSink(
    "out/results.jsonl",
    projector=lambda response: {
        "id": response.id,
        "text": response.output_text,
        "model": response.model,
        "usage": response.usage.to_dict(),
    },
)
# Save the raw output of each query.
per_request_sink = PerRequestJsonSink("out/")

executor = BatchExecutor(
    client=client,
    max_workers=16,
    max_rpm=500,
    max_tpm=200_000,
    sinks=[result_sink, per_request_sink],
    monitor=RollingMetricsMonitor(),
)

stats = executor.run(items)
print(stats.to_dict())

Structured output batches

Pass a Pydantic model as text_format when building batch items. BatchExecutor will call responses.parse(...) for those items and store the validated object on response.output_parsed.

from pydantic import BaseModel

from tokenrail import BatchExecutor, RailClient, ResultsJsonlSink
from tokenrail.executor import batch_items_from_queries


class PaperSummary(BaseModel):
    title: str
    key_assumptions: list[str]


client = RailClient.openai(max_retries=6)

items = batch_items_from_queries(
    {
        "paper-1": [{"role": "user", "content": "Extract the title and assumptions from this paper: ..."}],
        "paper-2": [{"role": "user", "content": "Extract the title and assumptions from this paper: ..."}],
    },
    model="gpt-5.4-mini-2026-03-17",
    reasoning_effort="medium",
    verbosity="low",
    text_format=PaperSummary,
)

sink = ResultsJsonlSink(
    "out/structured-results.jsonl",
    projector=lambda response: {
        "id": response.id,
        "summary": response.output_parsed.model_dump(mode="json") if response.output_parsed else None,
        "refusal": response.refusal,
        "usage": response.usage.to_dict(),
    },
)

stats = BatchExecutor(client=client, sinks=[sink], max_workers=16).run(items)
print(stats.to_dict())

Use response_format={...} with client.responses.create(...) when you want to provide a raw JSON Schema yourself. Use text_format=YourModel for Pydantic parsing; response_format and text_format cannot be used together.

Configuration notes

  • max_retries configures the OpenAI Python SDK client's built-in retry behavior. tokenrail does not add its own retry loop on top.
  • max_rpm and max_tpm are optional client-side submit limits. When a limit is set, BatchExecutor waits before submitting more work instead of raising its effective concurrency above the configured rate.
  • Request failures are captured as error records (written to sinks and counted in stats) rather than raised, so one failing item does not abort the batch.
  • base_url is passed through to the OpenAI Python SDK for callers that need an SDK-level custom endpoint.

Resume behavior

BatchExecutor reads completed ids from the first configured sink before it starts. Re-running the same job with the same output path skips records that are already present, then writes only the remaining requests.

If you use a custom projector with ResultsJsonlSink, make sure it keeps an "id" field — resume relies on it.

Cost tracking

  • Costs are estimated from a checked-in per-model pricing table (tokenrail.catalog). Models without a pricing entry get cost=None; prices may lag behind OpenAI's official pricing page, which is always authoritative.
  • OpenAI cost allocation is inferred from billing.payer in the response body. When payer == "openai", the nominal request cost is counted as OpenAI-covered rather than developer-billed.
  • reasoning_effort is gated to gpt-5 / o-series style models in the checked-in capability registry.

Development

uv sync
uv run pytest
uv run ruff check src tests

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tokenrail-1.1.0.tar.gz (54.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tokenrail-1.1.0-py3-none-any.whl (19.6 kB view details)

Uploaded Python 3

File details

Details for the file tokenrail-1.1.0.tar.gz.

File metadata

  • Download URL: tokenrail-1.1.0.tar.gz
  • Upload date:
  • Size: 54.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for tokenrail-1.1.0.tar.gz
Algorithm Hash digest
SHA256 7cc4e75baab26e143a01bd6bda254686c9d506f0d98f578091e2ccbe3f7e9bc4
MD5 b6773684f43f64fc5612f711ebbe891d
BLAKE2b-256 b516d5306f9076255f52343582f32df8184f892fcacbbfa8b844cb9e49d70ef0

See more details on using hashes here.

Provenance

The following attestation bundles were made for tokenrail-1.1.0.tar.gz:

Publisher: release.yml on takumi0shibata/tokenrail

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tokenrail-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: tokenrail-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 19.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for tokenrail-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 937bfa1f246228b81ff7415a45ad5d1d9a7013bd33ad1f84049865c996994e93
MD5 aeaa13faf7b9262c8e97af0a8a0fd9f5
BLAKE2b-256 a77660b034a13938f1a20a439fe3a26aba41321a0859198c52f8fa5e91057e65

See more details on using hashes here.

Provenance

The following attestation bundles were made for tokenrail-1.1.0-py3-none-any.whl:

Publisher: release.yml on takumi0shibata/tokenrail

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page