Automatic token, latency & cost tracking for every AI call — OpenAI, Anthropic, Gemini, Ollama
Project description
tokendetective
Automatic token, latency & cost tracking for every AI call — zero code changes.
tokendetective wraps your existing OpenAI, Anthropic, Gemini, or Ollama client and silently logs every request to the TokenLens dashboard. Token counts, latency, cost in USD and INR, and the full conversation trace all appear in real time — without touching your application logic.
pip install tokendetective
Table of Contents
- What It Does
- Installation
- Quick Start
- Supported Providers
- Usage Modes
- Constructor Reference
- Method Reference
- Async Support
- Manual Logging
- Local Cost Utilities
- Supported Models & Pricing
- Error Handling
- REST API (non-Python)
What It Does
- Tracks tokens in, tokens out, latency, and cost for every LLM call
- Logs everything to your TokenLens backend — visible in the Agent Runs dashboard
- Works transparently — your existing calls are unchanged
- Fires logs in a background thread so your app latency is unaffected
- Supports 20+ models with built-in USD → INR pricing
Installation
pip install tokendetective
Requires Python 3.9+. OpenAI, Anthropic, and python-dotenv are bundled as dependencies.
Quick Start
from tokenlens import TokenLens
tl = TokenLens(
api_key = "tl-your-api-key", # from TokenLens dashboard → Settings → API Keys
agent_name = "my-agent",
)
# Wrap once — use forever
client = tl.openai()
response = client.chat.completions.create(
model = "gpt-4o-mini",
messages = [{"role": "user", "content": "What is the capital of France?"}],
)
print(response.choices[0].message.content)
# → Paris is the capital of France.
# Token counts, latency, and cost are now in your TokenLens dashboard.
That's it. No middleware, no decorators, no extra API calls in your code.
Supported Providers
| Provider | Method | Notes |
|---|---|---|
| OpenAI | tl.openai() / tl.async_openai() |
Requires OPENAI_API_KEY |
| Anthropic | tl.anthropic() / tl.async_anthropic() |
Requires ANTHROPIC_API_KEY |
| Ollama (local) | tl.ollama() |
Uses OpenAI-compat layer |
| Any client | tl.wrap(client) |
Custom base URLs, Azure, proxies |
Usage Modes
Mode 1 — Wrap a provider client
The most common mode. Your API key goes directly to the provider; tokendetective only intercepts the response to log token counts.
# OpenAI
client = tl.openai()
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello!"}],
)
# Anthropic
claude = tl.anthropic()
msg = claude.messages.create(
model = "claude-3-5-haiku-20241022",
max_tokens= 256,
messages = [{"role": "user", "content": "Hello!"}],
)
# Ollama (local)
client = tl.ollama()
response = client.chat.completions.create(
model="llama3.2",
messages=[{"role": "user", "content": "Hello!"}],
)
Mode 2 — Bring your own client
Already have a configured client? Wrap it directly:
from openai import OpenAI
my_client = OpenAI(
api_key = "sk-...",
base_url = "https://your-azure-endpoint.openai.azure.com/",
)
tracked = tl.wrap(my_client)
Mode 3 — Manual logging
Have token counts from LangChain, LlamaIndex, or a custom HTTP client? Log manually:
tl.log(
model = "gpt-4o-mini",
tokens_in = 512,
tokens_out = 128,
latency_ms = 340.5,
query_text = "Summarise this document…",
)
Constructor Reference
TokenLens(
api_key : str,
base_url : str = None, # reads TOKENLENS_URL env var, falls back to https://13.126.130.56.nip.io
application : str = "tokenlens-sdk",
agent_name : str = "tokenlens-agent",
background : bool = True,
timeout : float = 10.0,
raise_on_error : bool = False,
)
| Parameter | Type | Default | Description |
|---|---|---|---|
api_key |
str |
required | Your tl- API key from the TokenLens dashboard |
base_url |
str |
env / default | Backend URL. Reads TOKENLENS_URL from .env, falls back to the hosted server |
application |
str |
"tokenlens-sdk" |
App label shown in the dashboard |
agent_name |
str |
"tokenlens-agent" |
Agent label shown in the Agent Runs page |
background |
bool |
True |
Fire-and-forget (non-blocking). Set False to block and return the log result |
timeout |
float |
10.0 |
HTTP timeout in seconds for log requests |
raise_on_error |
bool |
False |
Raise LoggingError on failure instead of silently logging a warning |
Method Reference
tl.openai(**kwargs) / tl.async_openai(**kwargs)
Create a tracked OpenAI client. All kwargs are forwarded to OpenAI().
tl.anthropic(**kwargs) / tl.async_anthropic(**kwargs)
Create a tracked Anthropic client.
tl.ollama(base_url=None, **kwargs)
Create a tracked Ollama client. Defaults to OLLAMA_HOST env var or http://localhost:11434.
tl.wrap(client)
Wrap any existing provider client instance. Supports openai.OpenAI, openai.AsyncOpenAI, anthropic.Anthropic, anthropic.AsyncAnthropic.
tl.log(*, model, tokens_in, tokens_out, latency_ms, query_text=None, response_text=None, application=None)
Manually log one AI request. Returns None in background mode, or {"usage_id", "cost_usd", "cost_inr"} when background=False.
await tl.alog(...)
Async version of log(). Always awaits and returns the response dict.
Async Support
import asyncio
from tokenlens import TokenLens
tl = TokenLens(api_key="tl-...")
async def main():
# Async OpenAI
client = tl.async_openai()
response = await client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)
# Async manual log
result = await tl.alog(
model="gpt-4o-mini", tokens_in=150, tokens_out=42, latency_ms=500.0
)
print(result) # {"usage_id": "…", "cost_usd": 0.000027, "cost_inr": 0.0023}
asyncio.run(main())
Manual Logging
Use tl.log() when you already have token counts from any source:
# Fire-and-forget (default)
tl.log(
model = "my-custom-llm",
tokens_in = 1024,
tokens_out = 256,
latency_ms = 820.0,
query_text = "User query here",
)
# Blocking — get cost back immediately
tl2 = TokenLens(api_key="tl-...", background=False)
result = tl2.log(model="gpt-4o-mini", tokens_in=150, tokens_out=42, latency_ms=500)
print(result)
# {"usage_id": "uuid…", "cost_usd": 0.00002745, "cost_inr": 0.00233325}
Local Cost Utilities
Calculate cost locally without any network call:
from tokenlens.pricing import compute_cost, list_models
cost = compute_cost("gpt-4o-mini", tokens_in=1500, tokens_out=420)
print(cost)
# {"usd": 0.000477, "inr": 0.040545}
# Custom exchange rate
cost = compute_cost("gpt-4o-mini", tokens_in=1500, tokens_out=420, usd_to_inr=84.5)
# Custom pricing for unlisted models
cost = compute_cost(
"my-model",
tokens_in = 1000,
tokens_out = 500,
custom_pricing = {"input": 0.002 / 1_000_000, "output": 0.008 / 1_000_000},
)
# All supported models
print(list_models())
Supported Models & Pricing
OpenAI
| Model | Input / 1M tokens | Output / 1M tokens |
|---|---|---|
gpt-4o |
$5.00 | $20.00 |
gpt-4o-mini |
$0.15 | $0.60 |
gpt-4-turbo |
$10.00 | $30.00 |
gpt-4 |
$30.00 | $60.00 |
gpt-3.5-turbo |
$0.50 | $1.50 |
o1 |
$15.00 | $60.00 |
o1-mini |
$3.00 | $12.00 |
o3 |
$10.00 | $40.00 |
o3-mini |
$1.10 | $4.40 |
Anthropic
| Model | Input / 1M tokens | Output / 1M tokens |
|---|---|---|
claude-opus-4-8 |
$15.00 | $75.00 |
claude-sonnet-4-6 |
$3.00 | $15.00 |
claude-haiku-4-5-20251001 |
$0.80 | $4.00 |
claude-3-5-sonnet-20241022 |
$3.00 | $15.00 |
claude-3-5-haiku-20241022 |
$0.80 | $4.00 |
claude-3-opus-20240229 |
$15.00 | $75.00 |
claude-3-haiku-20240307 |
$0.25 | $1.25 |
Google Gemini
| Model | Input / 1M tokens | Output / 1M tokens |
|---|---|---|
gemini-1.5-pro |
$1.25 | $5.00 |
gemini-1.5-flash |
$0.075 | $0.30 |
gemini-2.0-flash |
$0.10 | $0.40 |
gemini-2.0-flash-lite |
$0.075 | $0.30 |
Models not in the table fall back to a minimal default rate. Pass custom_pricing to compute_cost() for accurate local estimates.
Error Handling
The SDK never crashes your application by default:
import logging
logging.basicConfig(level=logging.WARNING)
tl = TokenLens(api_key="tl-...", raise_on_error=False) # default — safe
tl.log(model="gpt-4o-mini", tokens_in=100, tokens_out=50, latency_ms=500)
# If backend is unreachable: logs a warning, returns None — your app keeps running
Strict mode for tests:
from tokenlens import TokenLens, LoggingError
tl = TokenLens(api_key="tl-...", raise_on_error=True)
try:
tl.log(model="gpt-4o-mini", tokens_in=100, tokens_out=50, latency_ms=500)
except LoggingError as e:
print(f"Logging failed: {e}")
| Exception | When raised |
|---|---|
AuthError |
api_key is empty or does not start with tl- |
LoggingError |
Backend returned non-200, or connection timed out |
TokenLensError |
Base class for all SDK exceptions |
REST API (non-Python)
Use the backend directly from any language:
curl -X POST https://13.126.130.56.nip.io/v1/log \
-H "Authorization: Bearer tl-your-key" \
-H "Content-Type: application/json" \
-d '{
"application": "my-app",
"agent_name": "summariser",
"model_used": "gpt-4o-mini",
"tokens_in": 512,
"tokens_out": 128,
"latency_ms": 340.5,
"query_text": "Summarise this document…",
"response_text": "Here is a summary…"
}'
Response:
{
"usage_id": "b3c1a9f2-…",
"cost_usd": 0.0000927,
"cost_inr": 0.007880,
"total_tokens": 640,
"model_found": true
}
Links
- Dashboard: TokenLens
- PyPI: pypi.org/project/tokendetective
- Issues: GitHub Issues
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tokendetective-0.3.3.tar.gz.
File metadata
- Download URL: tokendetective-0.3.3.tar.gz
- Upload date:
- Size: 22.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
674c8a534c6ef450c7a196539876bb7d9574938a160ae79af79ca2d36535998d
|
|
| MD5 |
c8f269eb0aeac0425b6a78acf907fbac
|
|
| BLAKE2b-256 |
1d590fc82f4d27711d7e26d69a596c01345f83977100218312357bf17735e06d
|
File details
Details for the file tokendetective-0.3.3-py3-none-any.whl.
File metadata
- Download URL: tokendetective-0.3.3-py3-none-any.whl
- Upload date:
- Size: 17.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
75783daffb0f8838bdc88d788712c979a967a3814ba7709656ac92ad1b569457
|
|
| MD5 |
34500055762905a88738f4e095c91984
|
|
| BLAKE2b-256 |
924ff85c925cc41b040ef3c17e240b15beadc2daa36e069234702cb346fb187b
|