Skip to main content

AI cost calculator and usage tracker. Calculate LLM API costs for 400+ models (OpenAI, Anthropic, Google). No API key required for cost lookups.

Project description

ai-cost-calc (Python)

AI cost calculator and usage tracker for LLM apps.

  • Built for production-grade cost tracking, with pricing verification and continuous updates as model prices change.
  • Privacy-first: your app still talks directly to AI providers, so prompts/responses stay in your stack
  • Tracking is optional and sends usage plus event metadata (customer ID, event type, revenue if provided)

Use it in two ways:

  • Free cost calculator (cost) for 400+ models (no API key required):
    • exact mode with token counts (input_tokens, output_tokens)
    • estimate mode with prompt/response text (input_text, output_text)
    • live pricing with 24h cache per AiCostCalc instance
  • Usage tracking (add_usage + track) with an API key

Pricing Data

The model passed to cost(...) must match a slug from: https://margindash.com/api/v1/models

  • The SDK reads pricing from:
    • models[].pricing.input_per_1m_usd
    • models[].pricing.output_per_1m_usd
  • The API also returns benchmark variants at models[].benchmarks.variants (not required for cost())
  • Pricing data is cached per AiCostCalc instance for 24 hours
  • Cache refresh happens automatically when the cache is stale
  • If a refresh fails after a successful fetch, the SDK reuses last-known pricing and retries after backoff

Caching Behavior

  • Cache scope: per AiCostCalc instance
  • Cache TTL: 24 hours
  • Refresh failures: last-known pricing is reused, then retried after backoff
  • Force refresh now: create a new AiCostCalc instance

Requirements

  • Python 3.10+

Installation

pip install ai-cost-calc

For the tracking quickstart (OpenAI example):

pip install openai

For text-based estimation with tiktoken:

pip install ai-cost-calc[estimate]

Quickstart (Cost Calculator)

from ai_cost_calc import AiCostCalc

md = AiCostCalc()

# Exact cost from token counts
result = md.cost("openai/gpt-4o", input_tokens=1000, output_tokens=500)

# Estimate from input + output text
result2 = md.cost("openai/gpt-4o", input_text="Write a release note for this PR.", output_text="Here is the release note for v1.3.7.")

# Estimate from input text only (output defaults to 0 tokens)
result3 = md.cost("openai/gpt-4o", input_text="Write a release note for this PR.")

Quickstart (Usage Tracking)

Use an API key from your MarginDash dashboard.

from openai import OpenAI
from ai_cost_calc import AiCostCalc

openai = OpenAI(api_key="YOUR_OPENAI_KEY")
md = AiCostCalc(api_key="YOUR_API_KEY")

response = openai.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
)

md.add_usage(
    model=response.model,
    input_tokens=(response.usage.prompt_tokens if response.usage else 0),
    output_tokens=(response.usage.completion_tokens if response.usage else 0),
)

md.track(
    customer_id="cust_123",
    event_type="chat",
    revenue_amount_in_cents=250,
)

guarded = md.guarded_call(
    customer_id="cust_123",
    event_type="chat",
    call=lambda: openai.chat.completions.create(
        model="openai/gpt-4o",
        messages=[{"role": "user", "content": "Can I run?"}],
    ),
)
print(guarded.id)

md.shutdown()

guarded_call is synchronous and can do blocking HTTP I/O while refreshing budget state. In async frameworks (for example FastAPI), run it in a thread executor. Prefer async_guarded_call in async apps.

import asyncio
from ai_cost_calc import AiCostCalc

md = AiCostCalc(api_key="YOUR_API_KEY")

async def main():
    result = await md.async_guarded_call(
        customer_id="cust_123",
        event_type="chat",
        call=lambda: {"ok": True},
    )
    print(result)

asyncio.run(main())

When to Use Which Mode

If you need... Use...
Quick cost checks with no account setup cost() only
Exact costs from provider token usage cost(model, input_tokens, output_tokens)
Early estimation from prompt/response text cost(model, input_text, output_text)
MarginDash customer/revenue tracking add_usage() + track() with api_key
SDK-side budget blocking guarded_call() with api_key

Return Values and Failure Modes

Method Failure behavior
cost() Returns None
add_usage() / track() without api_key No-op, reports via on_error once
guarded_call() Raises when blocked by budget; defaults to fail-open on blocklist fetch failure
async_guarded_call() Same blocking semantics as guarded_call(), without blocking the event loop for budget refresh
flush() / shutdown() Do not raise for request failures; report via on_error

Common Integration Patterns

OpenAI (chat.completions):

md.add_usage(
    model=response.model,
    input_tokens=(response.usage.prompt_tokens if response.usage else 0),
    output_tokens=(response.usage.completion_tokens if response.usage else 0),
)

Anthropic (messages):

md.add_usage(
    model=response.model,
    input_tokens=(response.usage.input_tokens if response.usage else 0),
    output_tokens=(response.usage.output_tokens if response.usage else 0),
)

Google Gemini:

usage = response.get("usageMetadata", {})
md.add_usage(
    model=response.get("modelVersion", "google/gemini-2.5-flash"),
    input_tokens=usage.get("promptTokenCount", 0),
    output_tokens=usage.get("candidatesTokenCount", 0),
)

End-to-End Async Example (FastAPI + OpenAI)

import os
from fastapi import FastAPI
from pydantic import BaseModel
from openai import AsyncOpenAI
from ai_cost_calc import AiCostCalc

app = FastAPI()
openai = AsyncOpenAI(api_key=os.environ["OPENAI_API_KEY"])
md = AiCostCalc(api_key=os.environ["AI_COST_CALC_API_KEY"])


class ChatRequest(BaseModel):
    customer_id: str
    message: str


@app.post("/chat")
async def chat(req: ChatRequest):
    response = await md.async_guarded_call(
        customer_id=req.customer_id,
        event_type="chat",
        call=lambda: openai.chat.completions.create(
            model="openai/gpt-4o",
            messages=[{"role": "user", "content": req.message}],
        ),
    )

    usage = response.usage
    md.add_usage(
        model=response.model,
        input_tokens=(usage.prompt_tokens if usage else 0),
        output_tokens=(usage.completion_tokens if usage else 0),
    )
    md.track(customer_id=req.customer_id, event_type="chat")

    return {"text": response.choices[0].message.content}


@app.on_event("shutdown")
def _shutdown() -> None:
    md.shutdown()

Environment Variables

Recommended pattern:

import os
from ai_cost_calc import AiCostCalc

md = AiCostCalc(api_key=os.getenv("AI_COST_CALC_API_KEY"))

Common env vars:

  • AI_COST_CALC_API_KEY: required only for tracking (from your MarginDash dashboard)
  • OPENAI_API_KEY: only needed if you use OpenAI SDK in your app

API Reference

cost(model, *, input_tokens, output_tokens)

Exact cost mode.

  • model: model slug (example: openai/gpt-4o, anthropic/claude-sonnet-4)
  • input_tokens: non-negative integer
  • output_tokens: non-negative integer

cost(model, *, input_text, output_text=None)

Estimated cost mode using tiktoken.

  • input_text: prompt text
  • output_text: optional response text (defaults to 0 output tokens)

Returns CostResult | None.

None means one of:

  • unknown model
  • pricing fetch unavailable
  • invalid arguments
  • tokenizer unavailable/failure in estimate mode

CostResult fields:

  • model
  • input_cost
  • output_cost
  • total_cost
  • input_tokens
  • output_tokens
  • estimated

add_usage(*, model, input_tokens, output_tokens)

Buffers usage from one AI call. Requires api_key in constructor.

track(*, customer_id, revenue_amount_in_cents=None, event_type=None, unique_request_token=None, occurred_at=None)

Creates an event from all currently buffered usage entries and enqueues it for delivery. Requires api_key.

guarded_call(*, customer_id, call, event_type=None)

Runs call only when current cached budget state allows it.

  • customer_id: required
  • event_type: optional
  • call: provider call callback

Behavior:

  • Polls GET /api/v1/budgets/blocklist using TTL/version caching
  • Triggers immediate refresh when /events response returns a newer budget_state_version
  • Raises when blocked by organization/event/customer budget
  • Fail-open by default when blocklist fetch fails (set budget_fail_closed=True to invert)
  • Synchronous method; can block up to the blocklist timeout during refresh

async_guarded_call(*, customer_id, call, event_type=None)

Async variant of guarded_call for asyncio applications.

  • Runs budget refresh/check in a thread to avoid blocking the event loop
  • Accepts sync or async call callbacks
  • Uses the same budget blocking semantics as guarded_call

flush()

Immediately sends queued events.

shutdown()

Stops background flushing thread and sends remaining events. Call this before application exit.

Configuration

from ai_cost_calc import AiCostCalc

md = AiCostCalc(
    api_key="md_live_...",                     # optional for cost(); required for tracking
    base_url="https://margindash.com/api/v1",
    flush_interval=5.0,
    max_retries=3,
    default_event_type="ai_request",
    budget_fail_closed=False,
    on_error=lambda err: print(err.message),
)

Options:

  • api_key (optional)
  • base_url (default https://margindash.com/api/v1)
  • flush_interval seconds (default 5.0, must be a finite number > 0 when api_key is set)
  • max_retries (default 3, must be a non-negative integer)
  • default_event_type (default ai_request)
  • budget_fail_closed (default False; when True, blocks guarded_call if budget state cannot be refreshed)
  • on_error (optional callback)

Error Handling

The SDK avoids raising for typical operational failures in cost/tracking flows. Use on_error for observability.

from ai_cost_calc import AiCostCalc

md = AiCostCalc(api_key="md_live_...", on_error=lambda err: print(err.message))

Delivery Semantics

Tracking behavior:

  • in-memory queue size limit: 1000 events (oldest dropped when full)
  • pending usage limit before track: 1000 items (oldest dropped when full)
  • batch size: 50 events/request
  • retries on connection/timeouts, HTTP 429, and 5xx with exponential backoff

Idempotency:

  • unique_request_token is the idempotency key for an event
  • if omitted, SDK auto-generates a UUID
  • for retry-safe exactly-once behavior across your own retries, provide your own stable token

Privacy

Free cost mode only fetches pricing data. If tracking is enabled, the SDK sends event metadata (for example: customer ID, event type, revenue), plus model and token counts. Request/response content is not sent.

Troubleshooting

  • cost() returns None:
    • verify model slug
    • check network access to the pricing API
    • wire on_error callback for details
  • numbers look outdated:
    • pricing cache TTL is 24 hours per AiCostCalc instance
    • create a new AiCostCalc instance for an immediate refresh if needed
  • text estimation fails:
    • install extras: pip install ai-cost-calc[estimate]
  • tracking methods appear to do nothing:
    • confirm api_key is set in constructor
  • async app stalls on guarded_call:
    • use async_guarded_call, or run guarded_call in a thread executor
  • events missing on shutdown:
    • call md.shutdown() before app exits

Versioning and Releases

This SDK follows semantic versioning.

  • PyPI package: ai-cost-calc
  • changelog: CHANGELOG.md
  • check release history on PyPI/GitHub before major upgrades

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ai_cost_calc-1.3.12.tar.gz (19.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ai_cost_calc-1.3.12-py3-none-any.whl (13.2 kB view details)

Uploaded Python 3

File details

Details for the file ai_cost_calc-1.3.12.tar.gz.

File metadata

  • Download URL: ai_cost_calc-1.3.12.tar.gz
  • Upload date:
  • Size: 19.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for ai_cost_calc-1.3.12.tar.gz
Algorithm Hash digest
SHA256 6e6409305195f899d00d7981dcb65365bbf413cf284bcd3fdda949b21e11e8e6
MD5 e529bea8014610a15e6417c1c1da1326
BLAKE2b-256 f012dffedfffb31ac73df8e6ec243cc52a604b8222b65a726f3f7c91f8f73343

See more details on using hashes here.

File details

Details for the file ai_cost_calc-1.3.12-py3-none-any.whl.

File metadata

  • Download URL: ai_cost_calc-1.3.12-py3-none-any.whl
  • Upload date:
  • Size: 13.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for ai_cost_calc-1.3.12-py3-none-any.whl
Algorithm Hash digest
SHA256 77de4e7ba1f606add718c5183fe88c67c23cdf78d4f5f2a75f6f780e644bcb29
MD5 2efe29343de036087cf79e6935e307e3
BLAKE2b-256 9c2c9830cf911246ab0a0393beeb56902a620f1b29bbf3099041ff62eba599fc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page