Skip to main content

One-line HTTP-level auto-instrumentation for AI provider cost tracking. Catches every SDK, framework, and custom wrapper via httpx/requests interception. Supports OpenAI, Anthropic, Gemini, Cohere, Mistral, and 8 OpenAI-compatible providers (Groq, xAI, Together, Fireworks, Perplexity, DeepSeek, OpenRouter, Vercel AI Gateway).

Project description

aicostguard

One line. Zero secrets. Real-time AI cost tracking for OpenAI, Anthropic, and Gemini.

Drop-in observability for AI provider costs. No proxy, no shared keys, no per-call code.

PyPI License

Status: Beta. v0.1 is feature-complete for the supported configurations listed below. Not yet recommended for production-critical workloads without your own validation.

Install

pip install aicostguard-dev

Activate

Add one line at the top of your application entry file (e.g. app.py, main.py, manage.py):

import aicostguard.auto  # done.

Then export your AI Cost Guard ingestion key:

export AICG_KEY=aicg_xxxxxxxxxxxxxxxxxxxx
export AICG_URL=https://your-aicg-instance.example.com   # or our hosted URL

That's the entire integration. Every OpenAI / Anthropic / Gemini call your application makes is now automatically tracked.

What gets sent

Only this, per AI call:

{
  "provider": "openai",
  "model": "gpt-4o",
  "input_tokens": 1240,
  "output_tokens": 312,
  "latency_ms": 842,
  "feature": "generate_rag_answer"
}

Never sent:

  • Your AI provider API keys
  • Your prompts
  • The AI's responses
  • Any user data

The feature field is inferred automatically from the calling function name. You can override it explicitly:

import aicostguard as aicg

with aicg.feature("doc-parse"):
    completion = openai_client.chat.completions.create(...)

Supported configurations (v0.1)

Provider SDK Sync Async Non-streaming Streaming (with usage opt-in)
OpenAI openai ≥1.30, <2.0
Anthropic anthropic ≥0.40, <1.0
Google Gemini google-generativeai ≥0.8

Not in v0.1 (planned for v0.2):

  • LangChain / LlamaIndex auto-tagging
  • Azure OpenAI, AWS Bedrock SDK shapes
  • Cohere, Mistral SDKs
  • Streaming WITHOUT usage opt-in (we warn loudly today; tiktoken fallback in v0.2)

If your stack isn't listed yet, use Manual POST — fully supported and language-agnostic.

Runtime support

Runtime Supported Sender mode Notes
CPython 3.9+ long-running server (Flask, FastAPI, Django, Gunicorn, Celery, containers, local dev) background Daemon thread + bounded queue + atexit flush — same behaviour as today
Vercel Python functions inline Each receipt POSTs synchronously before the handler returns (Python has no waitUntil equivalent we can use globally). Adds ~50–200 ms to AI-route response.
AWS Lambda Python inline Same as Vercel — synchronous send before handler return
GCP Cloud Functions Python inline Detected via K_SERVICE / FUNCTION_NAME
Azure Functions Python inline Detected via AZURE_FUNCTIONS_ENVIRONMENT

Why the modes exist. On serverless platforms, the host freezes the container the moment the function returns. A long-running background drain thread does not survive that freeze — receipts queued after the response is sent are silently dropped. The package detects serverless and switches to inline mode: every submit() POSTs synchronously before returning, so by the time the host function returns, the receipt has hit the wire. There is no Python equivalent of Vercel's waitUntil() we can call from a module-scoped sender, so synchronous send is the only safe strategy.

Async-context caveat. Inline mode uses urllib.request.urlopen (synchronous). Inside an asyncio event loop (FastAPI, async Flask) this briefly blocks the loop while the receipt POSTs. A native async sender that uses httpx.AsyncClient is planned for a future release. Track-correctness is unaffected.

Trust contract (CI-enforced)

These properties are runtime-asserted in CI. No release ships without them passing. The relevant test files are linked.

  1. Cannot break your AI calls. test_safety_cannot_break_calls.py — observer exceptions are caught and swallowed; the SDK's original return value is always passed through.
  2. Cannot send prompts or responses. test_safety_payload_keys_only.py — receipts only contain {provider, model, input_tokens, output_tokens, latency_ms, feature?}. Any other key fails the build.
  3. Cannot leak your AI provider API key. test_safety_no_api_key_leak.py — receipt payloads are scanned for the literal API key values during every test run.
  4. Cannot block your call thread. test_safety_overhead_under_2ms.py — observer overhead is asserted <2ms p99 across 1,000 iterations.
  5. No silent failure modes. test_safety_warnings_fire.py — known issues (unsupported SDK version, streaming-without-usage, wrapper-before-import) each emit a clear warning.
  6. No silent receipt loss on serverless. test_safety_serverless_flush.py — with VERCEL=1 (or other serverless env), submit() POSTs synchronously before returning; no background thread is started; errors during the inline POST do not propagate.

These are the entire trust story. Read the source. Run the tests.

Diagnostics

Check what's instrumented and confirm receipts are flowing:

aicg-diagnose
# or:
python -m aicostguard.diagnostics

Output:

AI Cost Guard auto-instrumentation v0.1.0b1
─────────────────────────────────────────────
Instruments:
  ✅ openai 1.50.0          patched
  ✅ anthropic 0.42.0       patched
  ❌ google.generativeai    not installed
─────────────────────────────────────────────
Config:
  AICG_URL  https://your-aicg.example.com (reachable, 142 ms)
  AICG_KEY  aicg_xxxx••••••••••••••••  (valid format)
─────────────────────────────────────────────
Last receipt: 14:32:01 UTC  (200 OK)

How it works (in one paragraph)

When you import aicostguard.auto, the package scans sys.modules for known AI provider SDKs and monkey-patches their response-returning methods. Each patched method delegates to the original SDK call (so your code's behaviour is unchanged), then reads the usage field from the response object and submits a fire-and-forget receipt to AI Cost Guard. Everything is wrapped in try/except at multiple layers so observer errors can never propagate to your application. The technique is identical to the one Sentry, Datadog APM, and OpenTelemetry use for application monitoring — it's been running in production at hyperscaler scale for over a decade.

Configuration reference

Env var Default Description
AICG_KEY (none — package is no-op) Your ingestion key from the AI Cost Guard dashboard.
AICG_URL (none — package is no-op) Base URL of your AI Cost Guard backend.
AICG_FEATURE_DEFAULT inferred from caller frame Fallback feature tag when neither inference nor aicg.feature(...) applies.
AICG_DISABLED unset Set to 1 to short-circuit all tracking without removing the package.
AICG_DEBUG unset Set to 1 to emit verbose debug logs to stderr (do not use in production).

License

Apache 2.0. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aicostguard_dev-0.3.0b2.tar.gz (49.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aicostguard_dev-0.3.0b2-py3-none-any.whl (53.9 kB view details)

Uploaded Python 3

File details

Details for the file aicostguard_dev-0.3.0b2.tar.gz.

File metadata

  • Download URL: aicostguard_dev-0.3.0b2.tar.gz
  • Upload date:
  • Size: 49.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for aicostguard_dev-0.3.0b2.tar.gz
Algorithm Hash digest
SHA256 0f9e95b06a4a57da4277498a0b21cd960f6cd2f969ea0f1ee17dd2548ab65769
MD5 0dec0931e0d1b95376e24613b652163d
BLAKE2b-256 c4455742ac797c57c9fa07cb099ffff28e9c5f9c9b2488663062a99a5a9b1d57

See more details on using hashes here.

File details

Details for the file aicostguard_dev-0.3.0b2-py3-none-any.whl.

File metadata

File hashes

Hashes for aicostguard_dev-0.3.0b2-py3-none-any.whl
Algorithm Hash digest
SHA256 e9493b17ea73c700d9c5598d1bd8ba2ceeac7eca48fde4691bb1980fc0e927ea
MD5 9ea9047a27fb601220d57533cda8951a
BLAKE2b-256 e79ec17502f24dc8310877c2ffdfa5759a777f28ff5b022ed8fbc4b1e524b044

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page