Automatic token, latency & cost tracking for every AI call — OpenAI, Anthropic, Gemini, Ollama

These details have not been verified by PyPI

Project links

Project description

tokendetective

Automatic token, latency & cost tracking for every AI call — zero code changes.

tokendetective wraps your existing OpenAI, Anthropic, Gemini, or Ollama client and silently logs every request to the TokenLens dashboard. Token counts, latency, cost in USD and INR, and the full conversation trace all appear in real time — without touching your application logic.

pip install tokendetective

What It Does
Installation
Quick Start
Supported Providers
Usage Modes
Constructor Reference
Method Reference
Async Support
Manual Logging
Local Cost Utilities
Supported Models & Pricing
Error Handling
REST API (non-Python)

What It Does

Tracks tokens in, tokens out, latency, and cost for every LLM call
Logs everything to your TokenLens backend — visible in the Agent Runs dashboard
Works transparently — your existing calls are unchanged
Fires logs in a background thread so your app latency is unaffected
Supports 20+ models with built-in USD → INR pricing

Installation

pip install tokendetective

Requires Python 3.9+. OpenAI, Anthropic, and python-dotenv are bundled as dependencies.

Quick Start

from tokenlens import TokenLens

tl = TokenLens(
    api_key    = "tl-your-api-key",   # from TokenLens dashboard → Settings → API Keys
    agent_name = "my-agent",
)

# Wrap once — use forever
client = tl.openai()

response = client.chat.completions.create(
    model    = "gpt-4o-mini",
    messages = [{"role": "user", "content": "What is the capital of France?"}],
)

print(response.choices[0].message.content)
# → Paris is the capital of France.

# Token counts, latency, and cost are now in your TokenLens dashboard.

That's it. No middleware, no decorators, no extra API calls in your code.

Supported Providers

Provider	Method	Notes
OpenAI	`tl.openai()` / `tl.async_openai()`	Requires `OPENAI_API_KEY`
Anthropic	`tl.anthropic()` / `tl.async_anthropic()`	Requires `ANTHROPIC_API_KEY`
Ollama (local)	`tl.ollama()`	Uses OpenAI-compat layer
Any client	`tl.wrap(client)`	Custom base URLs, Azure, proxies

Usage Modes

Mode 1 — Wrap a provider client

The most common mode. Your API key goes directly to the provider; tokendetective only intercepts the response to log token counts.

# OpenAI
client   = tl.openai()
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello!"}],
)

# Anthropic
claude = tl.anthropic()
msg    = claude.messages.create(
    model     = "claude-3-5-haiku-20241022",
    max_tokens= 256,
    messages  = [{"role": "user", "content": "Hello!"}],
)

# Ollama (local)
client   = tl.ollama()
response = client.chat.completions.create(
    model="llama3.2",
    messages=[{"role": "user", "content": "Hello!"}],
)

Mode 2 — Bring your own client

Already have a configured client? Wrap it directly:

from openai import OpenAI

my_client = OpenAI(
    api_key  = "sk-...",
    base_url = "https://your-azure-endpoint.openai.azure.com/",
)
tracked = tl.wrap(my_client)

Mode 3 — Manual logging

Have token counts from LangChain, LlamaIndex, or a custom HTTP client? Log manually:

tl.log(
    model      = "gpt-4o-mini",
    tokens_in  = 512,
    tokens_out = 128,
    latency_ms = 340.5,
    query_text = "Summarise this document…",
)

Constructor Reference

TokenLens(
    api_key        : str,
    base_url       : str   = None,      # reads TOKENLENS_URL env var, falls back to https://13.126.130.56.nip.io
    application    : str   = "tokenlens-sdk",
    agent_name     : str   = "tokenlens-agent",
    background     : bool  = True,
    timeout        : float = 10.0,
    raise_on_error : bool  = False,
)

Parameter	Type	Default	Description
`api_key`	`str`	required	Your `tl-` API key from the TokenLens dashboard
`base_url`	`str`	env / default	Backend URL. Reads `TOKENLENS_URL` from `.env`, falls back to the hosted server
`application`	`str`	`"tokenlens-sdk"`	App label shown in the dashboard
`agent_name`	`str`	`"tokenlens-agent"`	Agent label shown in the Agent Runs page
`background`	`bool`	`True`	Fire-and-forget (non-blocking). Set `False` to block and return the log result
`timeout`	`float`	`10.0`	HTTP timeout in seconds for log requests
`raise_on_error`	`bool`	`False`	Raise `LoggingError` on failure instead of silently logging a warning

Method Reference

`tl.openai(kwargs)` / `tl.async_openai(kwargs)`

Create a tracked OpenAI client. All kwargs are forwarded to OpenAI().

`tl.anthropic(kwargs)` / `tl.async_anthropic(kwargs)`

Create a tracked Anthropic client.

`tl.ollama(base_url=None, **kwargs)`

Create a tracked Ollama client. Defaults to OLLAMA_HOST env var or http://localhost:11434.

`tl.wrap(client)`

Wrap any existing provider client instance. Supports openai.OpenAI, openai.AsyncOpenAI, anthropic.Anthropic, anthropic.AsyncAnthropic.

`tl.log(*, model, tokens_in, tokens_out, latency_ms, query_text=None, response_text=None, application=None)`

Manually log one AI request. Returns None in background mode, or {"usage_id", "cost_usd", "cost_inr"} when background=False.

`await tl.alog(...)`

Async version of log(). Always awaits and returns the response dict.

Async Support

import asyncio
from tokenlens import TokenLens

tl = TokenLens(api_key="tl-...")

async def main():
    # Async OpenAI
    client   = tl.async_openai()
    response = await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "Hello!"}],
    )
    print(response.choices[0].message.content)

    # Async manual log
    result = await tl.alog(
        model="gpt-4o-mini", tokens_in=150, tokens_out=42, latency_ms=500.0
    )
    print(result)  # {"usage_id": "…", "cost_usd": 0.000027, "cost_inr": 0.0023}

asyncio.run(main())

Manual Logging

Use tl.log() when you already have token counts from any source:

# Fire-and-forget (default)
tl.log(
    model      = "my-custom-llm",
    tokens_in  = 1024,
    tokens_out = 256,
    latency_ms = 820.0,
    query_text = "User query here",
)

# Blocking — get cost back immediately
tl2 = TokenLens(api_key="tl-...", background=False)
result = tl2.log(model="gpt-4o-mini", tokens_in=150, tokens_out=42, latency_ms=500)
print(result)
# {"usage_id": "uuid…", "cost_usd": 0.00002745, "cost_inr": 0.00233325}

Local Cost Utilities

Calculate cost locally without any network call:

from tokenlens.pricing import compute_cost, list_models

cost = compute_cost("gpt-4o-mini", tokens_in=1500, tokens_out=420)
print(cost)
# {"usd": 0.000477, "inr": 0.040545}

# Custom exchange rate
cost = compute_cost("gpt-4o-mini", tokens_in=1500, tokens_out=420, usd_to_inr=84.5)

# Custom pricing for unlisted models
cost = compute_cost(
    "my-model",
    tokens_in  = 1000,
    tokens_out = 500,
    custom_pricing = {"input": 0.002 / 1_000_000, "output": 0.008 / 1_000_000},
)

# All supported models
print(list_models())

Supported Models & Pricing

OpenAI

Model	Input / 1M tokens	Output / 1M tokens
`gpt-4o`	$5.00	$20.00
`gpt-4o-mini`	$0.15	$0.60
`gpt-4-turbo`	$10.00	$30.00
`gpt-4`	$30.00	$60.00
`gpt-3.5-turbo`	$0.50	$1.50
`o1`	$15.00	$60.00
`o1-mini`	$3.00	$12.00
`o3`	$10.00	$40.00
`o3-mini`	$1.10	$4.40

Anthropic

Model	Input / 1M tokens	Output / 1M tokens
`claude-opus-4-8`	$15.00	$75.00
`claude-sonnet-4-6`	$3.00	$15.00
`claude-haiku-4-5-20251001`	$0.80	$4.00
`claude-3-5-sonnet-20241022`	$3.00	$15.00
`claude-3-5-haiku-20241022`	$0.80	$4.00
`claude-3-opus-20240229`	$15.00	$75.00
`claude-3-haiku-20240307`	$0.25	$1.25

Google Gemini

Model	Input / 1M tokens	Output / 1M tokens
`gemini-1.5-pro`	$1.25	$5.00
`gemini-1.5-flash`	$0.075	$0.30
`gemini-2.0-flash`	$0.10	$0.40
`gemini-2.0-flash-lite`	$0.075	$0.30

Models not in the table fall back to a minimal default rate. Pass custom_pricing to compute_cost() for accurate local estimates.

Error Handling

The SDK never crashes your application by default:

import logging
logging.basicConfig(level=logging.WARNING)

tl = TokenLens(api_key="tl-...", raise_on_error=False)  # default — safe
tl.log(model="gpt-4o-mini", tokens_in=100, tokens_out=50, latency_ms=500)
# If backend is unreachable: logs a warning, returns None — your app keeps running

Strict mode for tests:

from tokenlens import TokenLens, LoggingError

tl = TokenLens(api_key="tl-...", raise_on_error=True)
try:
    tl.log(model="gpt-4o-mini", tokens_in=100, tokens_out=50, latency_ms=500)
except LoggingError as e:
    print(f"Logging failed: {e}")

Exception	When raised
`AuthError`	`api_key` is empty or does not start with `tl-`
`LoggingError`	Backend returned non-200, or connection timed out
`TokenLensError`	Base class for all SDK exceptions

REST API (non-Python)

Use the backend directly from any language:

curl -X POST https://13.126.130.56.nip.io/v1/log \
  -H "Authorization: Bearer tl-your-key" \
  -H "Content-Type: application/json" \
  -d '{
    "application":   "my-app",
    "agent_name":    "summariser",
    "model_used":    "gpt-4o-mini",
    "tokens_in":     512,
    "tokens_out":    128,
    "latency_ms":    340.5,
    "query_text":    "Summarise this document…",
    "response_text": "Here is a summary…"
  }'

Response:

{
  "usage_id":     "b3c1a9f2-…",
  "cost_usd":     0.0000927,
  "cost_inr":     0.007880,
  "total_tokens": 640,
  "model_found":  true
}

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.3.3

Jun 13, 2026

0.3.2

Jun 13, 2026

0.3.1

Jun 13, 2026

0.3.0

Jun 12, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tokendetective-0.3.3.tar.gz (22.9 kB view details)

Uploaded Jun 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tokendetective-0.3.3-py3-none-any.whl (17.9 kB view details)

Uploaded Jun 13, 2026 Python 3

File details

Details for the file tokendetective-0.3.3.tar.gz.

File metadata

Download URL: tokendetective-0.3.3.tar.gz
Upload date: Jun 13, 2026
Size: 22.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for tokendetective-0.3.3.tar.gz
Algorithm	Hash digest
SHA256	`674c8a534c6ef450c7a196539876bb7d9574938a160ae79af79ca2d36535998d`
MD5	`c8f269eb0aeac0425b6a78acf907fbac`
BLAKE2b-256	`1d590fc82f4d27711d7e26d69a596c01345f83977100218312357bf17735e06d`

See more details on using hashes here.

File details

Details for the file tokendetective-0.3.3-py3-none-any.whl.

File metadata

Download URL: tokendetective-0.3.3-py3-none-any.whl
Upload date: Jun 13, 2026
Size: 17.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for tokendetective-0.3.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`75783daffb0f8838bdc88d788712c979a967a3814ba7709656ac92ad1b569457`
MD5	`34500055762905a88738f4e095c91984`
BLAKE2b-256	`924ff85c925cc41b040ef3c17e240b15beadc2daa36e069234702cb346fb187b`

See more details on using hashes here.

tokendetective 0.3.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

tokendetective

Table of Contents

What It Does

Installation

Quick Start

Supported Providers

Usage Modes

Mode 1 — Wrap a provider client

Mode 2 — Bring your own client

Mode 3 — Manual logging

Constructor Reference

Method Reference

tl.openai(**kwargs) / tl.async_openai(**kwargs)

tl.anthropic(**kwargs) / tl.async_anthropic(**kwargs)

tl.ollama(base_url=None, **kwargs)

tl.wrap(client)

tl.log(*, model, tokens_in, tokens_out, latency_ms, query_text=None, response_text=None, application=None)

await tl.alog(...)

Async Support

Manual Logging

Local Cost Utilities

Supported Models & Pricing

OpenAI

Anthropic

Google Gemini

Error Handling

REST API (non-Python)

Links

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`tl.openai(kwargs)` / `tl.async_openai(kwargs)`

`tl.anthropic(kwargs)` / `tl.async_anthropic(kwargs)`

`tl.ollama(base_url=None, **kwargs)`

`tl.wrap(client)`

`tl.log(*, model, tokens_in, tokens_out, latency_ms, query_text=None, response_text=None, application=None)`

`await tl.alog(...)`