Skip to main content

Self-hosted, cross-provider LLM token metering reconciliation — independently verify what providers bill you and find where to migrate.

Project description

TokenLedger — one 5-second AI video = 246,840 tokens, and most of your AI bill you can't check at all

TokenLedger

See how much of your AI bill is actually checkable — and how much you pay on pure trust. Every figure labelled EXACT, BOUNDED, or UNVERIFIABLE.

License Python Self-hosted

Website · by Nativerse


Providers self-report token usage and bill you on it. Some of that bill can be checked against what you actually received; most of it — reasoning and cache tokens you never see — cannot be checked by anyone. TokenLedger re-counts the output you received, reconciles it against the reported usage, and above all tells you how much of your bill has any ground truth at all, labelling every figure EXACT, BOUNDED, or UNVERIFIABLE. Self-hosted — no prompt or response content leaves your machine.

What this is, and is not. Re-counting the output is a consistency check that binds the bill to the artifact you received — it catches substitution and metering bugs, but it is not an independent measure of whether your true cost is fair. The genuine value is measuring the unverifiable share. See docs/known-limitations.md.

See it in action

TokenLedger reconciling two live BytePlus video bills

Two real BytePlus (Seedance) video bills, re-derived from the delivered files. Video tokens follow (width × height / 1024) × frames, so re-deriving the count (246,840 and 108,900, gap 0) is a consistency check: it binds the bill to the file you received and would catch a 1080p-billed / 480p-delivered swap, but a matching number is what an honest provider always produces. The figure that should worry you is the unverifiable majority — reasoning and cache — that no method can check. Reproduce: examples/byteplus_validation.py.

What it can and cannot check

Bucket Confidence How
Output tokens EXACT Re-tokenised with the model's real tokenizer (OpenAI via tiktoken; DeepSeek, Qwen, Llama, Mistral, Gemma via their own), pinned by provenance. A gap is a flag to investigate, not a verdict — it can mean over-reporting, the wrong tokenizer, or non-canonical generation.
Input tokens BOUNDED Re-counts what you sent plus documented overhead; flags figures outside a tolerance band. Cannot reconstruct hidden server-side additions.
Reasoning tokens UNVERIFIABLE Billed but never returned to you. Recorded, never asserted. On reasoning models this is most of the bill.
Cache hit/miss UNVERIFIABLE Provider-internal per call. Recorded; verify behaviourally across many calls.
Billing period THREE-WAY Captured per-call usage vs the provider's billing/usage-API total. When a provider's own two numbers disagree, no tokenizer is needed.

Every result carries its confidence label. The tool never claims proof it does not have.

Quick start

git clone https://github.com/Neelagiri65/tokenledger
cd tokenledger
pip install -e ".[exact]"             # CLI + tiktoken + tokenizers (exact mode)

retoken demo                       # offline demo: plants discrepancies, catches them
open retoken_demo.html             # the dashboard

Installing without the [exact] extra runs in estimator mode, where exact-only buckets are labelled BOUNDED instead of EXACT. Tokenisation runs locally; only the public tokenizer file is fetched once, and you can bundle it for air-gapped sites.

Sidecar over an existing LiteLLM gateway

LiteLLM already writes spend logs. Point TokenLedger at them and it audits the numbers from the outside — an out-of-band audit layer that does not route or proxy your traffic, so it adds no latency and no point of failure.

retoken ingest litellm_spendlogs.jsonl --format litellm --db tokenledger.db
retoken report --db tokenledger.db --html report.html --md discrepancy.md
open report.html

Set STORE_PROMPTS_IN_SPEND_LOGS=true on LiteLLM so output tokens can be re-counted exactly. Without captured text, output and input are reported as UNVERIFIABLE, never falsely flagged.

Docker

docker compose run --rm retoken demo
docker compose run --rm retoken ingest litellm_spendlogs.jsonl --format litellm
docker compose run --rm retoken report --html report.html

How it compares

  • vs LiteLLM / Helicone — those aggregate the provider's reported usage; a wrong number stays wrong on the dashboard. TokenLedger re-counts the output independently — a consistency check against the artifact you received — and measures the share that cannot be checked at all.
  • vs CoIn (arXiv:2505.13778) — that approach is cooperative, needing the provider to publish commitments and a trusted auditor. TokenLedger is passive: you only need the output you already received.

What it does not claim

It does not claim any provider is overbilling — in the runs above the counts matched exactly. It does not judge whether the unit price is fair. It checks the count, so that you can too, and it is explicit about how much of a modern bill nobody can independently check.

Design principles

  1. No data egress. All counting and reconciliation are local. Content can be stored hashed only (Store(redact=True)).
  2. Honest confidence. Each bucket is EXACT, BOUNDED, or UNVERIFIABLE. No false certainty.
  3. Passive. Logging never changes or breaks the real call; a logging failure is swallowed.
  4. Multi-provider, multi-session, multi-user. Pluggable adapters, every record tagged.
  5. Vendor-neutral. Pricing and tolerances are configuration, not hard-coded assumptions.

Status

The reconciliation engine, store, recorder, dashboard, discrepancy report, pluggable cost model, and connectors for OpenAI, Anthropic, LiteLLM, and Bedrock usage shapes are working and covered by the test suite. Demand and the closed-model band width are being validated with design partners. We do not claim a result we have not measured.

Licence

Apache-2.0. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

retoken-0.1.0.tar.gz (103.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

retoken-0.1.0-py3-none-any.whl (80.6 kB view details)

Uploaded Python 3

File details

Details for the file retoken-0.1.0.tar.gz.

File metadata

  • Download URL: retoken-0.1.0.tar.gz
  • Upload date:
  • Size: 103.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for retoken-0.1.0.tar.gz
Algorithm Hash digest
SHA256 7e936fc06659144f1c7432209ebb04a071e254654a4b347fd47f27d0c69c2113
MD5 c982d55dab09a033e6e18c00918da4f4
BLAKE2b-256 45c369f9d027120782533a1e2bb9d7471c28bd9a61ff8b632d935a864f0af374

See more details on using hashes here.

File details

Details for the file retoken-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: retoken-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 80.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for retoken-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 90de4364b7bf4cb0afcc7939ca6a2706479b0975aeb3bd496d26b41466e9ab9f
MD5 acba998ee87672b364538495ac3e034b
BLAKE2b-256 5dc03ba46c97e7dc60ec3848cddb117ec6d89ad311a34962f5d2940f58f5a3ea

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page