Skip to main content

Cross-provider LLM token tracking and cost calculation

Project description

tokenX logo

🧮 tokenx-core

Instant cost • Instant latency • Zero code refactor

👉 Like what you see? Star the repo  and follow @dvlshah for updates!

PyPI CI Python Versions License Downloads

> Plug-and-play decorators for tracking **cost** & **latency** of LLM API calls.

tokenx provides a simple way to monitor the cost and performance of your LLM integrations without changing your existing code. Just add decorators to your API call functions and get detailed metrics automatically.

Decorator in → Metrics out. Monitor cost & latency of any LLM function without touching its body.

pip install tokenx-core[openai]          # 1️⃣ install
from tokenx.metrics import measure_cost, measure_latency   # 2️⃣ decorate
from openai import OpenAI

@measure_latency
@measure_cost(provider="openai", model="gpt-4o-mini")
def ask(prompt: str):
    return OpenAI().chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}]
    )

resp, m = ask("Hello!")                                   # 3️⃣ run
print(m["cost_usd"], "USD |", m["latency_ms"], "ms")

🤔 Why tokenx?

Integrating with LLM APIs often involves hidden costs and variable performance. Manually tracking token usage and calculating costs across different models and providers is tedious and error-prone. tokenx simplifies this by:

  • Effortless Integration: Add monitoring with simple decorators, no need to refactor your API call logic.
  • Accurate Cost Tracking: Uses up-to-date, configurable pricing (including caching discounts) for precise cost analysis.
  • Performance Insights: Easily measure API call latency to identify bottlenecks.
  • Multi-Provider Ready: Designed to consistently monitor costs across different LLM vendors (OpenAI currently supported, more coming soon!).

🏗️ Architecture (1‑min overview)

flowchart LR
    subgraph user["Your code"]
        F((API call))
    end
    F -->|decorators| D[tokenx wrapper]
    D -- cost --> C[CostCalculator]
    D -- latency --> L[Latency Timer]
    C -- lookup --> Y[model_prices.yaml]
    D -->|metrics| M[Structured JSON → stdout / exporter]

No vendor lock‑in: pure‑Python wrapper emits plain dicts—pipe them to Prometheus, Datadog, or stdout.


💡 Features at a glance

  • Track & save money – live USD costing with cached‑token discounts
  • Trace latency – pinpoint slow models or network hops
  • Plug‑&‑play decorators – wrap any sync or async function
  • Provider plug‑ins – OpenAI today, Anthropic & Gemini next
  • Typed – 100 % py.typed, 95 %+ mypy coverage
  • Zero deps – slims Docker images

📦 Installation

pip install tokenx-core                 # stable
pip install tokenx-core[openai]         # with provider extras

🚀 Quick Start

Here's how to monitor your OpenAI API calls with just two lines of code:

from tokenx.metrics import measure_cost, measure_latency
from openai import OpenAI

@measure_latency
@measure_cost(provider="openai", model="gpt-4o-mini")  # Always specify provider and model
def call_openai():
    client = OpenAI()
    return client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "Hello, world!"}]
    )

response, metrics = call_openai()

# Access your metrics
print(f"Cost: ${metrics['cost_usd']:.6f}")
print(f"Latency: {metrics['latency_ms']:.2f}ms")
print(f"Tokens: {metrics['input_tokens']} in, {metrics['output_tokens']} out")
print(f"Cached tokens: {metrics['cached_tokens']}")  # New in v0.2.0

🔍 Detailed Usage

Cost Tracking

The measure_cost decorator requires explicit provider and model specification:

@measure_cost(provider="openai", model="gpt-4o")  # Explicit specification required
def my_function(): ...

@measure_cost(provider="openai", model="gpt-4o", tier="flex")  # Optional tier
def my_function(): ...

Latency Measurement

The measure_latency decorator works with both sync and async functions:

@measure_latency
def sync_function(): ...

@measure_latency
async def async_function(): ...

Combining Decorators

Decorators can be combined in any order:

@measure_latency
@measure_cost(provider="openai", model="gpt-4o")
def my_function(): ...

# Equivalent to:
@measure_cost(provider="openai", model="gpt-4o")
@measure_latency
def my_function(): ...

Async Usage

Both decorators work seamlessly with async functions:

import asyncio
from tokenx.metrics import measure_cost, measure_latency
from openai import AsyncOpenAI # Use Async client

@measure_latency
@measure_cost(provider="openai", model="gpt-4o-mini")
async def call_openai_async():
    client = AsyncOpenAI()
    response = await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "Tell me an async joke!"}]
    )
    return response

async def main():
    response, metrics = await call_openai_async()
    print(metrics)

# asyncio.run(main()) # Example of how to run it

Direct Cost Calculation

For advanced use cases, you can calculate costs directly:

from tokenx.cost_calc import CostCalculator

# Create a calculator for a specific provider and model
calc = CostCalculator.for_provider("openai", "gpt-4o")

# Calculate cost from token counts
cost = calc.calculate_cost(
    input_tokens=100,
    output_tokens=50,
    cached_tokens=20
)

# Calculate cost from response object
cost = calc.cost_from_response(response)

🔄 Provider Compatibility

tokenx is designed to work with multiple LLM providers. Here's the current compatibility matrix:

Provider Status SDK Version Response Formats Models
OpenAI >= 1.0.0 Dict, Pydantic All models (GPT-4, GPT-3.5, etc.)
Anthropic 🔜 - - Claude models (coming soon)
Google 🔜 - - Gemini models (coming soon)

OpenAI Support Details

  • SDK Versions: Compatible with OpenAI Python SDK v1.0.0 and newer
  • Response Formats:
    • Dictionary responses from older SDK versions
    • Pydantic model responses from newer SDK versions
    • Cached token extraction from prompt_tokens_details.cached_tokens
  • API Types:
    • Chat Completions API
    • Traditional Completions API
    • Support for the newer Responses API coming soon

🛠️ Advanced Configuration

Custom Pricing

Prices are loaded from the model_prices.yaml file. You can update this file when new models are released or prices change:

openai:
  gpt-4o:
    sync:
      in: 2.50        # USD per million input tokens
      cached_in: 1.25 # USD per million cached tokens
      out: 10.00      # USD per million output tokens

Error Handling

tokenx provides detailed error messages to help diagnose issues:

from tokenx.errors import TokenExtractionError, PricingError

try:
    calculator = CostCalculator.for_provider("openai", "gpt-4o")
    cost = calculator.cost_from_response(response)
except TokenExtractionError as e:
    print(f"Token extraction failed: {e}")
except PricingError as e:
    print(f"Pricing error: {e}")

📊 Example Metrics Output

When you use the decorators, you'll get a structured metrics dictionary:

{
    "provider": "openai",
    "model": "gpt-4o-mini",
    "tier": "sync",
    "input_tokens": 12,
    "output_tokens": 48,
    "cached_tokens": 20,        # New in v0.2.0
    "cost_usd": 0.000348,       # $0.000348 USD
    "latency_ms": 543.21        # 543.21 milliseconds
}

🤝 Contributing

git clone https://github.com/dvlshah/tokenx.git
pre-commit install
pip install -e .[dev]   # or `poetry install`
pytest -q && mypy src/

See CONTRIBUTING.md for details.


📝 Changelog

See CHANGELOG.md for full history.


📜 License

MIT © 2025 Deval Shah

If tokenX saves you time or money, please consider sponsoring or giving a ⭐ – it really helps!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tokenx_core-0.2.2.tar.gz (26.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tokenx_core-0.2.2-py3-none-any.whl (20.0 kB view details)

Uploaded Python 3

File details

Details for the file tokenx_core-0.2.2.tar.gz.

File metadata

  • Download URL: tokenx_core-0.2.2.tar.gz
  • Upload date:
  • Size: 26.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.12

File hashes

Hashes for tokenx_core-0.2.2.tar.gz
Algorithm Hash digest
SHA256 18f6e9e10b3cd2f4273aa23251ed4b3d5be4ee3f60fb859b33ef3347e38e8634
MD5 b3143c1f131c921a8ae7343aada6f7d7
BLAKE2b-256 90fc0b9dd875629d07c21fa2eaf34df5f6a0be7a9a51defef0cfd5735f5953c5

See more details on using hashes here.

File details

Details for the file tokenx_core-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: tokenx_core-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 20.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.12

File hashes

Hashes for tokenx_core-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 baa13baa95ecf806e9cea6348c3643aa6c1c41754c8465e64548589fb59ca070
MD5 c77e0bf1d6ad87a2198086f821f2bbfd
BLAKE2b-256 1c331ad28bfc65d3f26103c50d3ce1ee3e42163e626c12fbb67a6ba58c3191dc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page