Auto-track LLM cost, latency, and usage. Two lines of code, every provider.
Project description
LLM Tracer — Python SDK
Track cost, latency, and token usage across OpenAI, Anthropic, and Google Gemini — in one line of code.
Install
pip install llmtracer-sdk
Quick Start
import llmtracer
llmtracer.init(api_key="lt_...")
# That's it. All OpenAI, Anthropic, and Google Gemini calls are now tracked automatically.
No wrappers, no callbacks, no code changes. The SDK auto-patches your provider clients at import time.
View your dashboard at llmtracer.dev.
What Gets Captured
Every LLM call is automatically tracked with:
- Provider, model, tokens (input + output), latency, cost
- Google Gemini: thinking tokens (2.5 models), tool tokens, cached tokens
- Anthropic: cache creation + read tokens
- OpenAI: reasoning tokens (o1/o3/o4), cached tokens
- Caller file, function, and line number
- Auto-flush on process exit (no manual flush needed)
Environment Variable Pattern
import os
import llmtracer
llmtracer.init(
api_key=os.environ["LLMTRACER_API_KEY"],
debug=True, # prints token counts to console
)
Multi-App Tracking
If you have multiple services sharing an API key, set app_name to filter by application in the dashboard:
llmtracer.init(api_key="lt_...", app_name="billing-service")
Or via environment variable:
export LLMTRACER_APP_NAME=billing-service
Trace Context and Tags
with llmtracer.trace(tags={"feature": "chat", "user_id": "u_sarah"}):
response = client.chat.completions.create(...)
Tags appear in the dashboard's Breakdown page and Top Tags card. Use them to answer questions like "which user costs the most?" or "which feature should I optimize?"
Tagging Patterns
| Pattern | Tag | Example |
|---|---|---|
| Track cost by feature | feature |
"chat", "search", "summarize" |
| Track cost by user | user_id |
"u_sarah", "u_mike" |
| Track cost by customer (B2B) | customer |
"acme-corp", "initech" |
| Track cost by conversation | conversation_id |
"conv_abc123" |
| Track environment | env |
"production", "staging" |
Supported Providers
| Provider | Package | Auto-patched |
|---|---|---|
| OpenAI | openai |
Yes |
| Anthropic | anthropic |
Yes |
| Google Gemini | google-genai |
Yes |
LangChain Support
If you use LangChain with ChatOpenAI, ChatAnthropic, or ChatGoogleGenerativeAI, the underlying SDK calls are auto-captured. No callback handler needed — just llmtracer.init() and you're done.
Configuration
| Option | Type | Default | Range | Description |
|---|---|---|---|---|
api_key |
str |
required | — | Your LLM Tracer API key (starts with lt_) |
app_name |
str |
None |
— | Application name for multi-app filtering. Falls back to LLMTRACER_APP_NAME env var |
endpoint |
str |
Production URL | — | Ingestion endpoint URL |
skip_exit_handlers |
bool |
False |
— | Skip atexit handler registration (for serverless environments) |
max_batch_size |
int |
50 |
1–500 | Max events per HTTP request |
flush_interval_s |
float |
5.0 |
1.0–60.0 | Auto-flush interval in seconds |
max_queue_size |
int |
1000 |
100–10000 | Max events in queue before dropping oldest |
max_retries |
int |
3 |
0–10 | Max retry attempts for failed flushes |
sample_rate |
float |
1.0 |
0.0–1.0 | Sampling rate. 0.5 captures ~50% of events |
debug |
bool |
False |
— | Enable debug logging to console |
All numeric options are validated on init(). Out-of-range values are replaced with the default, and a warning is logged when debug=True.
Debug Mode
Enable debug=True to print token counts to the console:
llmtracer.init(api_key="lt_...", debug=True)
[llmtracer] openai gpt-4o | 1,247 in -> 384 out | $0.0094 | 1.2s
[llmtracer] anthropic claude-sonnet-4-5 | 2,100 in -> 512 out (cache_read: 1,800) | $0.0031 | 0.8s
[llmtracer] google gemini-2.5-pro | 900 in -> 280 out (thinking: 1,420) | $0.0067 | 2.1s
Reliability
The SDK is designed to never interfere with your application:
- Never throws — all internal errors are swallowed silently (enable
debug=Truefor visibility) - Batching — events are queued and sent in batches of
max_batch_size - Retry with backoff — failed flushes are retried up to
max_retriestimes with exponential backoff (min(1.0 * 2^attempt, 30.0)) plus random jitter (0–1.0s) - Drop after retries — after
max_retriesconsecutive failures, the batch is dropped to prevent unbounded memory growth - Queue overflow — drops oldest events when the queue exceeds
max_queue_size - Sampling — set
sample_ratebelow 1.0 to reduce volume in high-throughput environments
Requirements
- Python 3.8+
- Works with any version of
openai,anthropic, orgoogle-genaiSDKs
Zero Dependencies
The core SDK uses only Python stdlib (urllib.request, threading, hashlib).
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llmtracer_sdk-2.5.0.tar.gz.
File metadata
- Download URL: llmtracer_sdk-2.5.0.tar.gz
- Upload date:
- Size: 41.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
595cf3e20c36b56cf493b9b53c8b8753d6442b19476d89278d5696c4771710db
|
|
| MD5 |
4b82cb469063e0a17f09953af40593a9
|
|
| BLAKE2b-256 |
241b6bdfea76498f8642a09492770d1f6333c170daffa91b0c2717fdb573b43e
|
File details
Details for the file llmtracer_sdk-2.5.0-py3-none-any.whl.
File metadata
- Download URL: llmtracer_sdk-2.5.0-py3-none-any.whl
- Upload date:
- Size: 22.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
efff98a591499f72eb1c996b0b2457f9654c967706f8acea147cbef454207f1a
|
|
| MD5 |
51f4a0d05c1715949eec08a69118efcc
|
|
| BLAKE2b-256 |
29e5ed2ce4cadba18c0d4433ac0e3000ca1089aa6b238fd0be1a8b07b8238bd8
|