Skip to main content

Auto-track LLM cost, latency, and usage. Two lines of code, every provider.

Project description

LLM Tracer — Python SDK

Track cost, latency, and token usage across OpenAI, Anthropic, and Google Gemini — in one line of code.

version

Install

pip install llmtracer-sdk

Quick Start

import llmtracer

llmtracer.init(api_key="lt_...")

# That's it. All OpenAI, Anthropic, and Google Gemini calls are now tracked automatically.

No wrappers, no callbacks, no code changes. The SDK auto-patches your provider clients at import time.

View your dashboard at llmtracer.dev.

What Gets Captured

Every LLM call is automatically tracked with:

  • Provider, model, tokens (input + output), latency, cost
  • Google Gemini: thinking tokens (2.5 models), tool tokens, cached tokens
  • Anthropic: cache creation + read tokens
  • OpenAI: reasoning tokens (o1/o3/o4), cached tokens
  • Caller file, function, and line number
  • Auto-flush on process exit (no manual flush needed)

Environment Variable Pattern

import os
import llmtracer

llmtracer.init(
    api_key=os.environ["LLMTRACER_API_KEY"],
    debug=True,  # prints token counts to console
)

Multi-App Tracking

If you have multiple services sharing an API key, set app_name to filter by application in the dashboard:

llmtracer.init(api_key="lt_...", app_name="billing-service")

Or via environment variable:

export LLMTRACER_APP_NAME=billing-service

Trace Context and Tags

# Correct: pass tags as keyword arguments
with llmtracer.trace(feature="chat", user_id="u_sarah"):
    response = client.chat.completions.create(...)

# Also works (deprecated — emits DeprecationWarning):
with llmtracer.trace(tags={"feature": "chat"}):
    ...

Tags appear in the dashboard's Breakdown page and Top Tags card. Use them to answer questions like "which user costs the most?" or "which feature should I optimize?"

Tagging Patterns

Pattern Tag Example
Track cost by feature feature "chat", "search", "summarize"
Track cost by user user_id "u_sarah", "u_mike"
Track cost by customer (B2B) customer "acme-corp", "initech"
Track cost by conversation conversation_id "conv_abc123"
Track environment env "production", "staging"

Supported Providers

Provider Package Auto-patched
OpenAI openai Yes
Anthropic anthropic Yes
Google Gemini google-genai Yes

LangChain Support

If you use LangChain with ChatOpenAI, ChatAnthropic, or ChatGoogleGenerativeAI, the underlying SDK calls are auto-captured. No callback handler needed — just llmtracer.init() and you're done.

Configuration

Option Type Default Range Description
api_key str required Your LLM Tracer API key (starts with lt_)
app_name str None Application name for multi-app filtering. Falls back to LLMTRACER_APP_NAME env var
endpoint str Production URL Ingestion endpoint URL
skip_exit_handlers bool False Skip atexit handler registration (for serverless environments)
max_batch_size int 50 1–500 Max events per HTTP request
flush_interval_s float 5.0 1.0–60.0 Auto-flush interval in seconds
max_queue_size int 1000 100–10000 Max events in queue before dropping oldest
max_retries int 3 0–10 Max retry attempts for failed flushes
sample_rate float 1.0 0.0–1.0 Sampling rate. 0.5 captures ~50% of events
debug bool False Enable debug logging to console

All numeric options are validated on init(). Out-of-range values are replaced with the default, and a warning is logged when debug=True.

Flushing Events

The SDK batches events and sends them in the background. In long-running processes (web servers, daemons), this is fully automatic. For short-lived scripts and serverless environments, you need to flush before the process exits.

Auto-flush (long-running processes)

By default the SDK registers an atexit handler and flushes on process exit:

import llmtracer

llmtracer.init(api_key="lt_...")

# Events are flushed automatically when the process exits

Manual flush (serverless / short-lived)

Call llmtracer.flush() before returning from a handler or Lambda function:

import llmtracer

llmtracer.init(api_key="lt_...", skip_exit_handlers=True)

def handler(event, context):
    response = client.chat.completions.create(...)
    llmtracer.flush()  # send before function returns
    return response

pytest fixture

Wrap your test session with a flush to capture events from tests:

import pytest
import llmtracer

@pytest.fixture(scope="session", autouse=True)
def flush_llmtracer():
    yield
    llmtracer.flush()

SIGTERM handler (Cloud Run / Kubernetes)

import signal
import llmtracer

def handle_sigterm(signum, frame):
    llmtracer.flush()
    raise SystemExit(0)

signal.signal(signal.SIGTERM, handle_sigterm)

Debug Mode

Enable debug=True to print token counts to the console:

llmtracer.init(api_key="lt_...", debug=True)
[llmtracer] openai gpt-4o | 1,247 in -> 384 out | $0.0094 | 1.2s
[llmtracer] anthropic claude-sonnet-4-5 | 2,100 in -> 512 out (cache_read: 1,800) | $0.0031 | 0.8s
[llmtracer] google gemini-2.5-pro | 900 in -> 280 out (thinking: 1,420) | $0.0067 | 2.1s

Reliability

The SDK is designed to never interfere with your application:

  • Never throws — all internal errors are swallowed silently (enable debug=True for visibility)
  • Batching — events are queued and sent in batches of max_batch_size
  • Retry with backoff — failed flushes are retried up to max_retries times with exponential backoff (min(1.0 * 2^attempt, 30.0)) plus random jitter (0–1.0s)
  • Drop after retries — after max_retries consecutive failures, the batch is dropped to prevent unbounded memory growth
  • Queue overflow — drops oldest events when the queue exceeds max_queue_size
  • Sampling — set sample_rate below 1.0 to reduce volume in high-throughput environments

Requirements

  • Python 3.8+
  • Works with any version of openai, anthropic, or google-genai SDKs

Zero Dependencies

The core SDK uses only Python stdlib (urllib.request, threading, hashlib).

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmtracer_sdk-2.5.1.tar.gz (53.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llmtracer_sdk-2.5.1-py3-none-any.whl (27.1 kB view details)

Uploaded Python 3

File details

Details for the file llmtracer_sdk-2.5.1.tar.gz.

File metadata

  • Download URL: llmtracer_sdk-2.5.1.tar.gz
  • Upload date:
  • Size: 53.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for llmtracer_sdk-2.5.1.tar.gz
Algorithm Hash digest
SHA256 6672dff5265c87c7f420be14793eff2643f1be81d158606101c7d438277811ba
MD5 d2ce87c03bc9c876ada8f86248d49add
BLAKE2b-256 d2c6920609fd29f7bb03ccd8c5f7e66c6cdb21c2c1a59bedacf5ef0262e5110c

See more details on using hashes here.

File details

Details for the file llmtracer_sdk-2.5.1-py3-none-any.whl.

File metadata

  • Download URL: llmtracer_sdk-2.5.1-py3-none-any.whl
  • Upload date:
  • Size: 27.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for llmtracer_sdk-2.5.1-py3-none-any.whl
Algorithm Hash digest
SHA256 fadb0a70dd88a912732207f5e9ab3ad95268983480cac52b4b51d073796f6e2f
MD5 276394143eafe98698cff187baedfd0d
BLAKE2b-256 31242239834c29d06dfb3ae3d680582655f307a98c029546d0ca654d04d785a3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page