Skip to main content

Automatic token, latency & cost tracking for every AI call — OpenAI, Anthropic, Gemini, Ollama

Project description

tokendetective

Automatic token, latency & cost tracking for every AI call — zero code changes.

tokendetective wraps your existing OpenAI, Anthropic, Gemini, or Ollama client and silently logs every request to the TokenLens dashboard. Token counts, latency, cost in USD and INR, and the full conversation trace all appear in real time — without touching your application logic.

pip install tokendetective

PyPI version Python License: MIT


Table of Contents


What It Does

  • Tracks tokens in, tokens out, latency, and cost for every LLM call
  • Logs everything to your TokenLens backend — visible in the Agent Runs dashboard
  • Works transparently — your existing calls are unchanged
  • Fires logs in a background thread so your app latency is unaffected
  • Supports 20+ models with built-in USD → INR pricing

Installation

pip install tokendetective

Requires Python 3.9+. OpenAI, Anthropic, and python-dotenv are bundled as dependencies.


Quick Start

from tokenlens import TokenLens

tl = TokenLens(
    api_key    = "tl-your-api-key",   # from TokenLens dashboard → Settings → API Keys
    agent_name = "my-agent",
)

# Wrap once — use forever
client = tl.openai()

response = client.chat.completions.create(
    model    = "gpt-4o-mini",
    messages = [{"role": "user", "content": "What is the capital of France?"}],
)

print(response.choices[0].message.content)
# → Paris is the capital of France.

# Token counts, latency, and cost are now in your TokenLens dashboard.

That's it. No middleware, no decorators, no extra API calls in your code.


Supported Providers

Provider Method Notes
OpenAI tl.openai() / tl.async_openai() Requires OPENAI_API_KEY
Anthropic tl.anthropic() / tl.async_anthropic() Requires ANTHROPIC_API_KEY
Ollama (local) tl.ollama() Uses OpenAI-compat layer
Any client tl.wrap(client) Custom base URLs, Azure, proxies

Usage Modes

Mode 1 — Wrap a provider client

The most common mode. Your API key goes directly to the provider; tokendetective only intercepts the response to log token counts.

# OpenAI
client   = tl.openai()
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello!"}],
)

# Anthropic
claude = tl.anthropic()
msg    = claude.messages.create(
    model     = "claude-3-5-haiku-20241022",
    max_tokens= 256,
    messages  = [{"role": "user", "content": "Hello!"}],
)

# Ollama (local)
client   = tl.ollama()
response = client.chat.completions.create(
    model="llama3.2",
    messages=[{"role": "user", "content": "Hello!"}],
)

Mode 2 — Bring your own client

Already have a configured client? Wrap it directly:

from openai import OpenAI

my_client = OpenAI(
    api_key  = "sk-...",
    base_url = "https://your-azure-endpoint.openai.azure.com/",
)
tracked = tl.wrap(my_client)

Mode 3 — Manual logging

Have token counts from LangChain, LlamaIndex, or a custom HTTP client? Log manually:

tl.log(
    model      = "gpt-4o-mini",
    tokens_in  = 512,
    tokens_out = 128,
    latency_ms = 340.5,
    query_text = "Summarise this document…",
)

Constructor Reference

TokenLens(
    api_key        : str,
    base_url       : str   = None,      # reads TOKENLENS_URL env var, falls back to https://13.126.130.56.nip.io
    application    : str   = "tokenlens-sdk",
    agent_name     : str   = "tokenlens-agent",
    background     : bool  = True,
    timeout        : float = 10.0,
    raise_on_error : bool  = False,
)
Parameter Type Default Description
api_key str required Your tl- API key from the TokenLens dashboard
base_url str env / default Backend URL. Reads TOKENLENS_URL from .env, falls back to the hosted server
application str "tokenlens-sdk" App label shown in the dashboard
agent_name str "tokenlens-agent" Agent label shown in the Agent Runs page
background bool True Fire-and-forget (non-blocking). Set False to block and return the log result
timeout float 10.0 HTTP timeout in seconds for log requests
raise_on_error bool False Raise LoggingError on failure instead of silently logging a warning

Method Reference

tl.openai(**kwargs) / tl.async_openai(**kwargs)

Create a tracked OpenAI client. All kwargs are forwarded to OpenAI().

tl.anthropic(**kwargs) / tl.async_anthropic(**kwargs)

Create a tracked Anthropic client.

tl.ollama(base_url=None, **kwargs)

Create a tracked Ollama client. Defaults to OLLAMA_HOST env var or http://localhost:11434.

tl.wrap(client)

Wrap any existing provider client instance. Supports openai.OpenAI, openai.AsyncOpenAI, anthropic.Anthropic, anthropic.AsyncAnthropic.

tl.log(*, model, tokens_in, tokens_out, latency_ms, query_text=None, response_text=None, application=None)

Manually log one AI request. Returns None in background mode, or {"usage_id", "cost_usd", "cost_inr"} when background=False.

await tl.alog(...)

Async version of log(). Always awaits and returns the response dict.


Async Support

import asyncio
from tokenlens import TokenLens

tl = TokenLens(api_key="tl-...")

async def main():
    # Async OpenAI
    client   = tl.async_openai()
    response = await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "Hello!"}],
    )
    print(response.choices[0].message.content)

    # Async manual log
    result = await tl.alog(
        model="gpt-4o-mini", tokens_in=150, tokens_out=42, latency_ms=500.0
    )
    print(result)  # {"usage_id": "…", "cost_usd": 0.000027, "cost_inr": 0.0023}

asyncio.run(main())

Manual Logging

Use tl.log() when you already have token counts from any source:

# Fire-and-forget (default)
tl.log(
    model      = "my-custom-llm",
    tokens_in  = 1024,
    tokens_out = 256,
    latency_ms = 820.0,
    query_text = "User query here",
)

# Blocking — get cost back immediately
tl2 = TokenLens(api_key="tl-...", background=False)
result = tl2.log(model="gpt-4o-mini", tokens_in=150, tokens_out=42, latency_ms=500)
print(result)
# {"usage_id": "uuid…", "cost_usd": 0.00002745, "cost_inr": 0.00233325}

Local Cost Utilities

Calculate cost locally without any network call:

from tokenlens.pricing import compute_cost, list_models

cost = compute_cost("gpt-4o-mini", tokens_in=1500, tokens_out=420)
print(cost)
# {"usd": 0.000477, "inr": 0.040545}

# Custom exchange rate
cost = compute_cost("gpt-4o-mini", tokens_in=1500, tokens_out=420, usd_to_inr=84.5)

# Custom pricing for unlisted models
cost = compute_cost(
    "my-model",
    tokens_in  = 1000,
    tokens_out = 500,
    custom_pricing = {"input": 0.002 / 1_000_000, "output": 0.008 / 1_000_000},
)

# All supported models
print(list_models())

Supported Models & Pricing

OpenAI

Model Input / 1M tokens Output / 1M tokens
gpt-4o $5.00 $20.00
gpt-4o-mini $0.15 $0.60
gpt-4-turbo $10.00 $30.00
gpt-4 $30.00 $60.00
gpt-3.5-turbo $0.50 $1.50
o1 $15.00 $60.00
o1-mini $3.00 $12.00
o3 $10.00 $40.00
o3-mini $1.10 $4.40

Anthropic

Model Input / 1M tokens Output / 1M tokens
claude-opus-4-8 $15.00 $75.00
claude-sonnet-4-6 $3.00 $15.00
claude-haiku-4-5-20251001 $0.80 $4.00
claude-3-5-sonnet-20241022 $3.00 $15.00
claude-3-5-haiku-20241022 $0.80 $4.00
claude-3-opus-20240229 $15.00 $75.00
claude-3-haiku-20240307 $0.25 $1.25

Google Gemini

Model Input / 1M tokens Output / 1M tokens
gemini-1.5-pro $1.25 $5.00
gemini-1.5-flash $0.075 $0.30
gemini-2.0-flash $0.10 $0.40
gemini-2.0-flash-lite $0.075 $0.30

Models not in the table fall back to a minimal default rate. Pass custom_pricing to compute_cost() for accurate local estimates.


Error Handling

The SDK never crashes your application by default:

import logging
logging.basicConfig(level=logging.WARNING)

tl = TokenLens(api_key="tl-...", raise_on_error=False)  # default — safe
tl.log(model="gpt-4o-mini", tokens_in=100, tokens_out=50, latency_ms=500)
# If backend is unreachable: logs a warning, returns None — your app keeps running

Strict mode for tests:

from tokenlens import TokenLens, LoggingError

tl = TokenLens(api_key="tl-...", raise_on_error=True)
try:
    tl.log(model="gpt-4o-mini", tokens_in=100, tokens_out=50, latency_ms=500)
except LoggingError as e:
    print(f"Logging failed: {e}")
Exception When raised
AuthError api_key is empty or does not start with tl-
LoggingError Backend returned non-200, or connection timed out
TokenLensError Base class for all SDK exceptions

REST API (non-Python)

Use the backend directly from any language:

curl -X POST https://13.126.130.56.nip.io/v1/log \
  -H "Authorization: Bearer tl-your-key" \
  -H "Content-Type: application/json" \
  -d '{
    "application":   "my-app",
    "agent_name":    "summariser",
    "model_used":    "gpt-4o-mini",
    "tokens_in":     512,
    "tokens_out":    128,
    "latency_ms":    340.5,
    "query_text":    "Summarise this document…",
    "response_text": "Here is a summary…"
  }'

Response:

{
  "usage_id":     "b3c1a9f2-…",
  "cost_usd":     0.0000927,
  "cost_inr":     0.007880,
  "total_tokens": 640,
  "model_found":  true
}

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tokendetective-0.3.3.tar.gz (22.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tokendetective-0.3.3-py3-none-any.whl (17.9 kB view details)

Uploaded Python 3

File details

Details for the file tokendetective-0.3.3.tar.gz.

File metadata

  • Download URL: tokendetective-0.3.3.tar.gz
  • Upload date:
  • Size: 22.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for tokendetective-0.3.3.tar.gz
Algorithm Hash digest
SHA256 674c8a534c6ef450c7a196539876bb7d9574938a160ae79af79ca2d36535998d
MD5 c8f269eb0aeac0425b6a78acf907fbac
BLAKE2b-256 1d590fc82f4d27711d7e26d69a596c01345f83977100218312357bf17735e06d

See more details on using hashes here.

File details

Details for the file tokendetective-0.3.3-py3-none-any.whl.

File metadata

  • Download URL: tokendetective-0.3.3-py3-none-any.whl
  • Upload date:
  • Size: 17.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for tokendetective-0.3.3-py3-none-any.whl
Algorithm Hash digest
SHA256 75783daffb0f8838bdc88d788712c979a967a3814ba7709656ac92ad1b569457
MD5 34500055762905a88738f4e095c91984
BLAKE2b-256 924ff85c925cc41b040ef3c17e240b15beadc2daa36e069234702cb346fb187b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page