Cost tracking and budget enforcement for Azure OpenAI API calls

These details have not been verified by PyPI

Project description

openai-cost-guard

Track Azure OpenAI cost per call and enforce budgets - decorators, FastAPI middleware, and reporters, with streaming and async support.

What it does - Install - Quickstart - Architecture - Python API - CLI - Configuration - Default pricing - Design decisions - Known limitations - Roadmap - Contributing - License

What it does

Azure OpenAI bills by token. Without instrumentation, the first sign of a cost problem is the invoice - by which point you have already spent the money. openai-cost-guard wraps your API calls and records cost per call, per endpoint, and per model, so you can see spending before it becomes a surprise - and stop it with a configured budget.

It is a small library plus CLI, not a service. You add it to an existing Python application:

Decorators (@track_cost, plus class-method, async, and streaming variants) wrap any function that returns an OpenAI-style response and record its cost.
FastAPI / Starlette middleware binds a fresh tracker to every HTTP request and surfaces per-request cost as response headers.
Reporters turn recorded usage into a console table, JSON, or live OpenTelemetry metrics for Azure Monitor / Application Insights.
A CLI (openai-cost-guard) inspects saved JSON reports offline.

The core package depends only on Pydantic. There is no Azure SDK dependency - it works with the standard openai package - and FastAPI and Azure Monitor support live behind optional extras.

Install

pip install openai-cost-guard

Requires Python 3.11+. Optional integrations:

pip install "openai-cost-guard[fastapi]"   # FastAPI / Starlette middleware
pip install "openai-cost-guard[azure]"     # Azure Monitor / OpenTelemetry reporter

Quickstart

from openai_cost_guard import CostTracker, track_cost
from openai_cost_guard.reporters import print_report

tracker = CostTracker()

@track_cost(tracker=tracker, endpoint="summarise")
def summarise(client, text):
    return client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": text}],
    )

summarise(client, "...")
print_report(tracker.report())

openai-cost-guard report
+------------------------------+---------+------------+------------+--------------+--------------+
| Model                        |   Calls |     Prompt | Completion | Total Tokens |   Cost (USD) |
+------------------------------+---------+------------+------------+--------------+--------------+
| gpt-4o                       |       1 |        512 |        128 |          640 |      $0.0026 |
+------------------------------+---------+------------+------------+--------------+--------------+
| TOTAL                        |       1 |        512 |        128 |          640 |      $0.0026 |
+------------------------------+---------+------------+------------+--------------+--------------+

Architecture

Architecture diagram

Your application records usage through one of the entry points - the @track_cost decorator family or the FastAPI middleware. Both funnel into a CostTracker, which prices each call against the pricing table (built-in defaults plus any overrides) and stores a UsageRecord. Recorded usage is then turned into output by the reporters: a console table, JSON (string or file), or live OpenTelemetry metrics for Azure Monitor. An optional on_record hook lets a reporter receive each record as it is made; the CLI reads the JSON a reporter writes.

Python API

Decorator

The fastest integration. Wrap any function that returns an OpenAI response object:

from openai_cost_guard import CostTracker, track_cost

tracker = CostTracker()

@track_cost(tracker=tracker, endpoint="chat")
def chat(client, messages):
    return client.chat.completions.create(model="gpt-4o", messages=messages)

endpoint is an optional label stored on each record - use it to distinguish between different call sites in your reports. If omitted, the wrapped function's qualified name is used.

Class-method decorator

When the tracker lives on a service class:

from openai_cost_guard import CostTracker, track_cost_method

class SummaryService:
    def __init__(self):
        self.cost_tracker = CostTracker()

    @track_cost_method(endpoint="summary")
    def summarise(self, text):
        return self.client.chat.completions.create(...)

track_cost_method reads the tracker from self.cost_tracker by default; pass tracker_attr="..." to use a different attribute name.

Async

track_cost_async and track_cost_method_async are the coroutine equivalents. Tracker resolution is identical (explicit > request scope > default), and they work inside the FastAPI middleware too:

from openai_cost_guard import track_cost_async

@track_cost_async(endpoint="chat")
async def call_openai(client, messages):
    return await client.chat.completions.create(model="gpt-4o", messages=messages)

Streaming responses

A streamed completion returns an iterator of chunks, not a response with a .usage field, so the plain decorators record nothing. track_cost_stream wraps the iterator and records cost from the final usage chunk as you consume it. You must ask the API for usage with stream_options={"include_usage": True}:

from openai_cost_guard import track_cost_stream

@track_cost_stream(endpoint="chat")
def stream_chat(client, messages):
    return client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        stream=True,
        stream_options={"include_usage": True},   # required for cost capture
    )

for chunk in stream_chat(client, [...]):   # cost is recorded on the final chunk
    ...

Recording is lazy - it happens as you consume the stream, not when the call returns. track_cost_stream_async is the async-iterator version, consumed with async for.

Direct recording

For cases where you control the response object manually:

tracker = CostTracker()
tracker.record(
    model="gpt-4o",
    prompt_tokens=response.usage.prompt_tokens,
    completion_tokens=response.usage.completion_tokens,
    endpoint="my-pipeline",
    metadata={"user_id": "abc123"},
)

Budget enforcement

Raise an exception when spending exceeds a configured limit:

from openai_cost_guard import CostTracker, BudgetConfig, BudgetExceededError

tracker = CostTracker(
    budget=BudgetConfig(limit_usd=5.00, warn_at_percent=80)
)

try:
    tracker.record("gpt-4o", prompt_tokens=..., completion_tokens=...)
except BudgetExceededError as e:
    print(f"Stopped: ${e.spent:.4f} exceeds ${e.limit:.2f} limit")

A warning is logged at warn_at_percent (default 80%) before the limit is hit.

Custom pricing

Override built-in prices or add pricing for deployment names not in the default table:

from openai_cost_guard import CostTracker, ModelPricing

# Override at construction
tracker = CostTracker(pricing={
    "gpt-4o": ModelPricing(model="gpt-4o", input_per_million=2.00, output_per_million=8.00),
})

# Or register a deployment name at runtime
tracker.add_pricing(
    ModelPricing(model="my-gpt4o-deployment", input_per_million=2.50, output_per_million=10.00)
)

Deployment names that start with a known model name (e.g. gpt-4o-mini-2024-07-18) are matched by prefix automatically, longest prefix first.

FastAPI middleware

Track cost per HTTP request automatically. The middleware binds a fresh tracker to each request, so any @track_cost-decorated call made while handling that request records into it - no need to thread a tracker through your handlers.

from fastapi import FastAPI, Request
from openai_cost_guard import track_cost, BudgetConfig
from openai_cost_guard.middleware import CostGuardMiddleware

app = FastAPI()
app.add_middleware(
    CostGuardMiddleware,
    budget_per_request=BudgetConfig(limit_usd=0.50),  # optional per-request cap
)

@track_cost(endpoint="chat")          # no explicit tracker - uses the request scope
def call_model(client, messages):
    return client.chat.completions.create(model="gpt-4o", messages=messages)

@app.post("/chat")
async def chat(request: Request):
    call_model(client, [...])
    report = request.state.cost_tracker.report()   # this request's usage
    return {"cost_usd": report.total_cost}

Every response gets cost headers:

X-OpenAI-Cost-USD: 0.007500
X-OpenAI-Total-Tokens: 1500

Requires the fastapi extra:

pip install "openai-cost-guard[fastapi]"

Options: budget_per_request (per-request BudgetConfig), add_headers (default True), strict (raise on unknown model), and on_complete(scope, report) (callback fired after each request - use it to push metrics or persist usage).

Reporting

from openai_cost_guard.reporters import print_report, to_json, write_json, to_summary_dict

report = tracker.report()

print_report(report)                          # logs a formatted table
print_report(report, logger_name="my.app")    # logs to your own logger

to_json(report)                                # JSON string (records + totals)
write_json(report, "runs/usage.json")          # write to file, creates parent dirs
to_summary_dict(report)                        # compact aggregate dict, grouped by model

tracker.report() returns a CostReport Pydantic model - iterate report.records or access report.total_cost directly for custom reporting.

Azure Monitor / Application Insights

Stream cost and token metrics to Application Insights via OpenTelemetry. Configure the exporter once at startup, then wire the reporter's emit as the tracker's on_record hook so every call is reported as it happens:

from openai_cost_guard import CostTracker
from openai_cost_guard.reporters.azure_monitor import (
    AzureMonitorReporter,
    configure_azure_monitor,
)

# Reads APPLICATIONINSIGHTS_CONNECTION_STRING if no arg is passed
configure_azure_monitor()

reporter = AzureMonitorReporter()
tracker = CostTracker(on_record=reporter.emit)   # metrics emit per call

Metrics emitted (each tagged with model and endpoint dimensions):

Metric	Unit	Meaning
`openai.cost.usd`	USD	cost per call
`openai.tokens`	token	total tokens per call
`openai.calls`	call	call count

Requires the azure extra:

pip install "openai-cost-guard[azure]"

The reporter itself is vendor-neutral - it records to standard OpenTelemetry instruments. If you already run an OTel MeterProvider, pass your own Meter to AzureMonitorReporter(meter=...) and skip configure_azure_monitor.

Reset

tracker.reset()  # clears all records, keeps budget config

CLI

Inspect a saved JSON report (from write_json) without writing any code:

openai-cost-guard show runs/usage.json       # formatted per-model table
openai-cost-guard summary runs/usage.json    # aggregate totals as JSON

Command	Argument	Output
`show`	path to a JSON report	the same formatted per-model table as `print_report`
`summary`	path to a JSON report	aggregate totals as JSON (`to_summary_dict`)

Output goes to stdout. A missing file or a file that is not a valid cost report exits non-zero with a clear error.

Configuration

There are no required environment variables for the core library - configuration is in code (constructor arguments, custom ModelPricing). Two areas have install-time and environment configuration:

Concern	How to configure
FastAPI middleware	Install the `fastapi` extra. Construct via `app.add_middleware(CostGuardMiddleware, ...)`; tune with `budget_per_request`, `add_headers`, `strict`, `on_complete`.
Azure Monitor export	Install the `azure` extra. Set `APPLICATIONINSIGHTS_CONNECTION_STRING` (or pass `connection_string=` to `configure_azure_monitor`). Tune `export_interval_millis`.
Strict unknown-model handling	`CostTracker(strict=True)` (or `strict=True` on the middleware) raises `UnknownModelError` instead of recording at `$0.00`.
Logging	The package logs through the standard `logging` module under the `openai_cost_guard` namespace. Configure handlers/levels in your application.

Default pricing

Prices are USD per 1 million tokens, Azure OpenAI Global Standard, verified against the Azure pricing page on 2026-06-08.

Model	Input	Output
gpt-4o	$2.50	$10.00
gpt-4o-mini	$0.15	$0.60
gpt-4.1	$2.00	$8.00
gpt-4.1-mini	$0.40	$1.60
gpt-4.1-nano	$0.10	$0.40
gpt-4-turbo	$11.00	$33.00
gpt-35-turbo	$0.55	$1.65
text-embedding-3-small	$0.022	-
text-embedding-3-large	$0.143	-
text-embedding-ada-002	$0.11	-

Prices drift over time, and Regional and Data Zone deployments cost roughly 10% more than Global Standard. Pass your own ModelPricing objects (via CostTracker(pricing=...) or tracker.add_pricing(...)) to stay accurate.

Design decisions

Pure-ASGI middleware, not BaseHTTPMiddleware. The middleware sets a per-request tracker in a contextvar. Starlette's BaseHTTPMiddleware runs the endpoint in a separate task where that contextvar would not propagate, so the decorator could not find the request-scoped tracker. A pure-ASGI middleware keeps the endpoint in the same context, which is what makes "no explicit tracker needed" work. See openai_cost_guard/middleware.py:9.
Contextvar tracker resolution. @track_cost resolves its tracker at call time in the order explicit argument > request-scoped tracker > module default. The same decorated function records into a per-request tracker inside a request and the global tracker otherwise, with no plumbing at the call site. See openai_cost_guard/decorators.py:50 and openai_cost_guard/context.py.
Longest-prefix pricing match. Versioned deployment names (gpt-4o-mini-2024-07-18) are matched against pricing keys by longest prefix, so a specific entry (gpt-4o-mini) wins over a shorter one that is also a prefix (gpt-4o). Otherwise a mini model could be mispriced as its more expensive parent. See openai_cost_guard/tracker.py:123.
on_record hook fired outside the lock. The tracker holds an internal lock while appending and checking the budget, but invokes on_record after releasing it, so a slow sink (for example a network metric exporter) cannot stall other recorders. See openai_cost_guard/tracker.py:90.
Optional extras keep the core thin. CostGuardMiddleware and AzureMonitorReporter are intentionally not imported in the package __init__, so the core package has no web-framework or telemetry dependency - only Pydantic. See openai_cost_guard/__init__.py:16.
Logging, never print. All output goes through the logging module, including the console reporter and CLI (which routes the package logger to stdout). This keeps the library quiet by default and lets the host application control output.

Known limitations

Pricing is curated, not discovered. openai-cost-guard always tracks token usage for every model (that comes from the API response). It can only attach a dollar cost to models in its price table, because providers do not return price-per-token through the API. Unknown models are tracked at $0.00 with a warning rather than a guessed price (or raise with strict=True). Override or extend the table with CostTracker(pricing=...) / tracker.add_pricing(...).
Static prices drift. The default table reflects Azure Global Standard pricing on the verification date above. Prices change and vary by region/SKU; verify against current Azure pricing for anything cost-critical. (The roadmap item below addresses this.)
Streaming cost needs include_usage. A streamed call records nothing unless you pass stream_options={"include_usage": True}; without it the API never sends a usage chunk.

Roadmap

FastAPI middleware (v0.2)
JSON export reporter (v0.2)
Azure Monitor / Application Insights reporter (v0.3)
Async support (v0.3)
Streaming response cost capture (v0.4)
CLI for offline report inspection (v0.4)
Live pricing via the Azure Retail Prices API (v0.5) - fetch current Azure OpenAI prices from prices.azure.com (public, no auth) so the static table is a fallback, not the source of truth. Removes the staleness risk for Azure models. The fiddly part is mapping each model to its Azure meter name; must cache, and fall back to the static table on any uncertainty so it never reports a wrong price.

Benchmarks

Hot-path throughput, measured with scripts/benchmark.py (fake response objects, no live Azure). Run it yourself with python scripts/benchmark.py.

Measured on this machine with 100,000 iterations per case, fake responses, no network.

Operation	Iterations	Calls/sec	us/call
CostTracker.record()	100,000	56,663	17.65
CostTracker.record() + budget check	100,000	57,678	17.34
@track_cost-wrapped call (fake response)	100,000	52,739	18.96
record() via prefix-match pricing	100,000	46,001	21.74

Contributing

git clone https://github.com/TemidireAdesiji/openai-cost-guard
cd openai-cost-guard
pip install -e ".[dev]"
pytest

Lint and type-check before opening a PR:

ruff check .
mypy .
pytest --cov

PRs welcome. Open an issue before starting work on a new feature.

License

MIT

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.4.0

Jun 8, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openai_cost_guard-0.4.0.tar.gz (38.8 kB view details)

Uploaded Jun 8, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

openai_cost_guard-0.4.0-py3-none-any.whl (25.5 kB view details)

Uploaded Jun 8, 2026 Python 3

File details

Details for the file openai_cost_guard-0.4.0.tar.gz.

File metadata

Download URL: openai_cost_guard-0.4.0.tar.gz
Upload date: Jun 8, 2026
Size: 38.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.5

File hashes

Hashes for openai_cost_guard-0.4.0.tar.gz
Algorithm	Hash digest
SHA256	`e2b757e22f12fa17966e5f85a266a99baab774325eda71aa87d7b1887cf3558a`
MD5	`91bcdae5baa369776afdb361c4b036b0`
BLAKE2b-256	`2ec96205f4ed287fd1df16aca449300ae4e2719a588eb1aae85b8de37c41082e`

See more details on using hashes here.

File details

Details for the file openai_cost_guard-0.4.0-py3-none-any.whl.

File metadata

Download URL: openai_cost_guard-0.4.0-py3-none-any.whl
Upload date: Jun 8, 2026
Size: 25.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.5

File hashes

Hashes for openai_cost_guard-0.4.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`723aeaed8f5b3b64e77be3f0130915ebd8392b46f5021781345cbcc7b0bea8ad`
MD5	`4901a776fb7dc629ad41f745b510fb77`
BLAKE2b-256	`620244c69ed9b3e545a912753ff752577594c877c976783c268db48d5175b43d`

See more details on using hashes here.

openai-cost-guard 0.4.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

openai-cost-guard

What it does

Install

Quickstart

Architecture

Python API

Decorator

Class-method decorator

Async

Streaming responses

Direct recording

Budget enforcement

Custom pricing

FastAPI middleware

Reporting

Azure Monitor / Application Insights

Reset

CLI

Configuration

Default pricing

Design decisions

Known limitations

Roadmap

Benchmarks

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes