Skip to main content

Lightweight, framework-agnostic token tracking and LLM cost calculation across providers.

Project description

TokenHelm

Lightweight, framework-agnostic token tracking and LLM cost calculation across providers.

TokenHelm gives you one normalized usage/cost event for every LLM call — OpenAI, Gemini, Anthropic, or Ollama — without locking you into any framework, patching any provider SDK, or ever touching your credentials. It simply observes the response object your own client already returns.

from tokenhelm import TokenHelm

tracker = TokenHelm()                      # zero-config
response = client.chat.completions.create(...)   # your own OpenAI call
event = tracker.track(response)            # normalized LLMEvent
print(event.to_dict())
# {'provider': 'openai', 'model': 'gpt-4o', 'input_tokens': 1000,
#  'output_tokens': 500, 'total_tokens': 1500, 'latency': 0.0,
#  'cost': '0.00750', 'timestamp': '...', 'usage_complete': True,
#  'priced': True, 'currency': 'USD'}

Installation

pip install tokenhelm                  # core (only dependency: PyYAML)
pip install "tokenhelm[openai]"        # + OpenAI extras (for your own client)
pip install "tokenhelm[anthropic]"     # + Anthropic
pip install "tokenhelm[gemini]"        # + Google Gemini
pip install "tokenhelm[ollama]"        # + Ollama
pip install "tokenhelm[all]"           # all provider extras
pip install "tokenhelm[dev]"           # test/lint toolchain

Requires Python 3.11+. The extras only pull in the provider SDKs you call — TokenHelm itself never imports them to read a response.


Quick Start

from tokenhelm import TokenHelm

tracker = TokenHelm()

# 1. Manual tracking — track any completed response
event = tracker.track(response)

# 2. Scoped tracking — collect every event in a block
with tracker.trace() as scope:
    response = client.chat.completions.create(...)
    scope.track(response)
print(scope.events)           # [LLMEvent(...)]

# 3. Choose where events go (any logger, callable, or storage)
from tokenhelm import ConsoleLogger
tracker = TokenHelm(logger=[ConsoleLogger(), lambda e: metrics.push(e.to_dict())])

# 4. Bring your own pricing (file, dict, or a full PricingProvider)
tracker = TokenHelm(pricing="my_rates.yaml")
tracker = TokenHelm(pricing={"openai": {"gpt-4o": {"input": 2.5, "output": 10.0}}})

# 5. Reconfigure later without rebuilding
tracker.configure(currency="EUR")

# 6. Streaming — exactly one event after the stream is exhausted
for chunk in tracker.track_stream(client.chat.completions.create(..., stream=True)):
    ...   # consume chunks as usual

# 7. Async — same API with `async with` / `async for`
async with tracker.trace() as scope:
    scope.track(await aclient.chat.completions.create(...))

Every tracked request yields the same normalized LLMEvent with the eight mandated fields — provider, model, input_tokens, output_tokens, total_tokens, latency, cost, timestamp — plus usage_complete / priced status flags. Consumers never see a provider-specific usage object. Costs use decimal.Decimal (no float drift). Missing usage or unknown pricing degrade gracefully via the flags — tracking never raises on missing data.


Architecture

TokenHelm is built around five replaceable extension points; the core depends only on their interfaces, never on a concrete implementation.

            ┌──────────────────────────────────────────────────────────┐
            │                     Your Application                       │
            └───────────────────────────┬──────────────────────────────┘
                                         │  track() / trace() / configure()
                                         ▼
                              ┌────────────────────┐
                              │      TokenHelm      │   (sdk: client + TraceScope)
                              └─────────┬──────────┘
                                        ▼
                              ┌────────────────────┐
                              │    TokenTracker     │   builds the normalized LLMEvent
                              └───┬────────────┬───┘
                  extract usage  │            │  compute cost
                                 ▼            ▼
                   ┌──────────────────┐   ┌──────────────────┐
                   │   BaseAdapter ①  │   │ CostCalculator   │
                   │ OpenAI/Gemini/   │   └────────┬─────────┘
                   │ Anthropic/Ollama │            ▼
                   └──────────────────┘   ┌──────────────────┐
                                          │ PricingProvider ② │  (YAML default)
                                          └──────────────────┘
                                        │
                                        ▼  emit (tracker is unaware of sinks)
                              ┌────────────────────┐
                              │  EventDispatcher ③ │
                              └───┬────────────┬───┘
                                  ▼            ▼
                        ┌──────────────┐  ┌──────────────────┐
                        │   Logger ④   │  │ StorageBackend ⑤ │  (optional)
                        │ Console/...  │  └──────────────────┘
                        └──────────────┘

Extension points (all public & stable — Constitution Principle VI):

# Interface Default Swap it to…
BaseAdapter OpenAI, Gemini, Anthropic, Ollama add a new provider
PricingProvider YamlPricingProvider remote/dynamic pricing, AI FinOps
EventDispatcher DefaultEventDispatcher custom routing/batching/export
Logger ConsoleLogger JSON/file/metrics/dashboards
StorageBackend none (opt-in) in-memory/SQLite/warehouse/analytics

Dependency direction is strictly one-way (no reverse dependencies):

Application → TokenHelm → TokenTracker → EventDispatcher → Logger / StorageBackend
                              └────────→ CostCalculator → PricingProvider
                              └────────→ UsageParser → BaseAdapter

CostCalculator depends only on PricingProvider; TokenTracker emits only through EventDispatcher. Analytics, dashboards, and FinOps are downstream consumers of LLMEvent behind these interfaces — they require no change to the core.


Supported Providers

All four providers are supported, with streaming and async, in v0.1.0.

Provider Status Usage fields read
OpenAI ✅ supported usage.prompt_tokens / completion_tokens (Chat); input_tokens / output_tokens (Responses)
Google Gemini ✅ supported usage_metadata.prompt_token_count / candidates_token_count
Anthropic ✅ supported usage.input_tokens / output_tokens (+ cache token extras)
Ollama (local) ✅ supported prompt_eval_count / eval_count

All providers normalize into the same LLMEvent schema — switching providers is a configuration change, not a code change. Each adapter handles both completed responses and streaming.


Roadmap

v0.1.0 — Core SDK ✅ (current)

  • Track usage and cost across one provider (MVP): cost calculation, normalized event, scoped trace(), console logging, graceful degradation.
  • Provider parity: OpenAI, Gemini, Anthropic, Ollama adapters; identical event shape.
  • Output choice: JSONLogger, FileLogger, InMemoryStorageBackend, full configure() and multi-sink dispatch.
  • Streaming & async: track_stream() (one final event), async trace().
  • Hardening: <5 ms / <20 MB budgets, thread/async isolation suite, docs, packaging.

Beyond v0.1 — each tier is additive on the five extension points; the v0.1 core API does not change. See ROADMAP.md.

  • v0.2 — Analytics SDK (SQLiteStorageBackend + usage queries)
  • v0.3 — Prompt Intelligence (per-prompt/template attribution)
  • v0.4 — RAG Intelligence (retrieval-aware accounting)
  • v0.5 — AI FinOps (budgets, alerts, remote pricing)
  • v1.0 — Enterprise Platform (stabilize the v0.x surface; dashboard, plugins)

Design principles

Framework-agnostic · provider-independent · zero vendor lock-in · <5 ms overhead · observe-don't-patch · one standardized event · everything replaceable.

See specs/001-core-sdk/ for the constitution, spec, plan, data model, and public API contract.

Release Process

Releases follow a documented, automated procedure (Conventional Commits → release-please → Trusted Publishing on PyPI via OIDC). The canonical, end-to-end release procedure is the Go-Live & Release checklist — follow it for every release.

Supporting docs:

Contributors: see CONTRIBUTING.md for the dev workflow, versioning, and deprecation policy.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tokenhelm-0.1.0rc1.tar.gz (28.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tokenhelm-0.1.0rc1-py3-none-any.whl (35.1 kB view details)

Uploaded Python 3

File details

Details for the file tokenhelm-0.1.0rc1.tar.gz.

File metadata

  • Download URL: tokenhelm-0.1.0rc1.tar.gz
  • Upload date:
  • Size: 28.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for tokenhelm-0.1.0rc1.tar.gz
Algorithm Hash digest
SHA256 eb4a0213e3d9987e2064708a1c10d50feeba7f0cc9344facf8d474217fca34d3
MD5 e89bb241182cd9bd9bd206dbe1716d8d
BLAKE2b-256 9e5d5307b268d369ceab40855ff6f823d8930932bf884c8646e65a7a74c02ed8

See more details on using hashes here.

Provenance

The following attestation bundles were made for tokenhelm-0.1.0rc1.tar.gz:

Publisher: release.yml on srinitrumatics/tokenhelm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tokenhelm-0.1.0rc1-py3-none-any.whl.

File metadata

  • Download URL: tokenhelm-0.1.0rc1-py3-none-any.whl
  • Upload date:
  • Size: 35.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for tokenhelm-0.1.0rc1-py3-none-any.whl
Algorithm Hash digest
SHA256 eb3d662f5d747c83a56516a09daec35b73c8b7d7e06494dfa1d51c17128e8c95
MD5 6c58b30aa754a059d44038b0b40324c3
BLAKE2b-256 c29f0f61ef396bd4275551f5f5cfd4df56960b5b65738f73751314cd210c20f4

See more details on using hashes here.

Provenance

The following attestation bundles were made for tokenhelm-0.1.0rc1-py3-none-any.whl:

Publisher: release.yml on srinitrumatics/tokenhelm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page