One-line HTTP-level auto-instrumentation for AI provider cost tracking. Catches every SDK, framework, and custom wrapper via httpx/requests interception. Supports OpenAI, Anthropic, Gemini, Cohere, Mistral, and 8 OpenAI-compatible providers (Groq, xAI, Together, Fireworks, Perplexity, DeepSeek, OpenRouter, Vercel AI Gateway).
Project description
aicostguard
One line. Zero secrets. Real-time AI cost tracking for OpenAI, Anthropic, and Gemini.
Drop-in observability for AI provider costs. No proxy, no shared keys, no per-call code.
Status: Beta. v0.1 is feature-complete for the supported configurations listed below. Not yet recommended for production-critical workloads without your own validation.
Install
pip install aicostguard-dev
Activate
Add one line at the top of your application entry file (e.g. app.py, main.py, manage.py):
import aicostguard.auto # done.
Then export your AI Cost Guard ingestion key:
export AICG_KEY=aicg_xxxxxxxxxxxxxxxxxxxx
export AICG_URL=https://your-aicg-instance.example.com # or our hosted URL
That's the entire integration. Every OpenAI / Anthropic / Gemini call your application makes is now automatically tracked.
What gets sent
Only this, per AI call:
{
"provider": "openai",
"model": "gpt-4o",
"input_tokens": 1240,
"output_tokens": 312,
"latency_ms": 842,
"feature": "generate_rag_answer"
}
Never sent:
- Your AI provider API keys
- Your prompts
- The AI's responses
- Any user data
The feature field is inferred automatically from the calling function name. You can override it explicitly:
import aicostguard as aicg
with aicg.feature("doc-parse"):
completion = openai_client.chat.completions.create(...)
Supported configurations (v0.1)
| Provider | SDK | Sync | Async | Non-streaming | Streaming (with usage opt-in) |
|---|---|---|---|---|---|
| OpenAI | openai ≥1.30, <2.0 |
✅ | ✅ | ✅ | ✅ |
| Anthropic | anthropic ≥0.40, <1.0 |
✅ | ✅ | ✅ | ✅ |
| Google Gemini | google-generativeai ≥0.8 |
✅ | ✅ | ✅ | ✅ |
Not in v0.1 (planned for v0.2):
- LangChain / LlamaIndex auto-tagging
- Azure OpenAI, AWS Bedrock SDK shapes
- Cohere, Mistral SDKs
- Streaming WITHOUT usage opt-in (we warn loudly today;
tiktokenfallback in v0.2)
If your stack isn't listed yet, use Manual POST — fully supported and language-agnostic.
Runtime support
| Runtime | Supported | Sender mode | Notes |
|---|---|---|---|
| CPython 3.9+ long-running server (Flask, FastAPI, Django, Gunicorn, Celery, containers, local dev) | ✅ | background |
Daemon thread + bounded queue + atexit flush — same behaviour as today |
| Vercel Python functions | ✅ | inline |
Each receipt POSTs synchronously before the handler returns (Python has no waitUntil equivalent we can use globally). Adds ~50–200 ms to AI-route response. |
| AWS Lambda Python | ✅ | inline |
Same as Vercel — synchronous send before handler return |
| GCP Cloud Functions Python | ✅ | inline |
Detected via K_SERVICE / FUNCTION_NAME |
| Azure Functions Python | ✅ | inline |
Detected via AZURE_FUNCTIONS_ENVIRONMENT |
Why the modes exist. On serverless platforms, the host freezes the container the moment the function returns. A long-running background drain thread does not survive that freeze — receipts queued after the response is sent are silently dropped. The package detects serverless and switches to inline mode: every submit() POSTs synchronously before returning, so by the time the host function returns, the receipt has hit the wire. There is no Python equivalent of Vercel's waitUntil() we can call from a module-scoped sender, so synchronous send is the only safe strategy.
Async-context caveat. Inline mode uses
urllib.request.urlopen(synchronous). Inside an asyncio event loop (FastAPI, async Flask) this briefly blocks the loop while the receipt POSTs. A native async sender that useshttpx.AsyncClientis planned for a future release. Track-correctness is unaffected.
Trust contract (CI-enforced)
These properties are runtime-asserted in CI. No release ships without them passing. The relevant test files are linked.
- Cannot break your AI calls.
test_safety_cannot_break_calls.py— observer exceptions are caught and swallowed; the SDK's original return value is always passed through. - Cannot send prompts or responses.
test_safety_payload_keys_only.py— receipts only contain{provider, model, input_tokens, output_tokens, latency_ms, feature?}. Any other key fails the build. - Cannot leak your AI provider API key.
test_safety_no_api_key_leak.py— receipt payloads are scanned for the literal API key values during every test run. - Cannot block your call thread.
test_safety_overhead_under_2ms.py— observer overhead is asserted <2ms p99 across 1,000 iterations. - No silent failure modes.
test_safety_warnings_fire.py— known issues (unsupported SDK version, streaming-without-usage, wrapper-before-import) each emit a clear warning. - No silent receipt loss on serverless.
test_safety_serverless_flush.py— withVERCEL=1(or other serverless env),submit()POSTs synchronously before returning; no background thread is started; errors during the inline POST do not propagate.
These are the entire trust story. Read the source. Run the tests.
Diagnostics
Check what's instrumented and confirm receipts are flowing:
aicg-diagnose
# or:
python -m aicostguard.diagnostics
Output:
AI Cost Guard auto-instrumentation v0.1.0b1
─────────────────────────────────────────────
Instruments:
✅ openai 1.50.0 patched
✅ anthropic 0.42.0 patched
❌ google.generativeai not installed
─────────────────────────────────────────────
Config:
AICG_URL https://your-aicg.example.com (reachable, 142 ms)
AICG_KEY aicg_xxxx•••••••••••••••• (valid format)
─────────────────────────────────────────────
Last receipt: 14:32:01 UTC (200 OK)
How it works (in one paragraph)
When you import aicostguard.auto, the package scans sys.modules for known AI provider SDKs and monkey-patches their response-returning methods. Each patched method delegates to the original SDK call (so your code's behaviour is unchanged), then reads the usage field from the response object and submits a fire-and-forget receipt to AI Cost Guard. Everything is wrapped in try/except at multiple layers so observer errors can never propagate to your application. The technique is identical to the one Sentry, Datadog APM, and OpenTelemetry use for application monitoring — it's been running in production at hyperscaler scale for over a decade.
Configuration reference
| Env var | Default | Description |
|---|---|---|
AICG_KEY |
(none — package is no-op) | Your ingestion key from the AI Cost Guard dashboard. |
AICG_URL |
(none — package is no-op) | Base URL of your AI Cost Guard backend. |
AICG_FEATURE_DEFAULT |
inferred from caller frame | Fallback feature tag when neither inference nor aicg.feature(...) applies. |
AICG_DISABLED |
unset | Set to 1 to short-circuit all tracking without removing the package. |
AICG_DEBUG |
unset | Set to 1 to emit verbose debug logs to stderr (do not use in production). |
License
Apache 2.0. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file aicostguard_dev-0.3.0b2.tar.gz.
File metadata
- Download URL: aicostguard_dev-0.3.0b2.tar.gz
- Upload date:
- Size: 49.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0f9e95b06a4a57da4277498a0b21cd960f6cd2f969ea0f1ee17dd2548ab65769
|
|
| MD5 |
0dec0931e0d1b95376e24613b652163d
|
|
| BLAKE2b-256 |
c4455742ac797c57c9fa07cb099ffff28e9c5f9c9b2488663062a99a5a9b1d57
|
File details
Details for the file aicostguard_dev-0.3.0b2-py3-none-any.whl.
File metadata
- Download URL: aicostguard_dev-0.3.0b2-py3-none-any.whl
- Upload date:
- Size: 53.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e9493b17ea73c700d9c5598d1bd8ba2ceeac7eca48fde4691bb1980fc0e927ea
|
|
| MD5 |
9ea9047a27fb601220d57533cda8951a
|
|
| BLAKE2b-256 |
e79ec17502f24dc8310877c2ffdfa5759a777f28ff5b022ed8fbc4b1e524b044
|