Skip to main content

Self-hosted, cross-provider LLM token metering reconciliation — independently verify what providers bill you and find where to migrate.

Project description

TokenLedger — one 5-second AI video = 246,840 tokens, and most of your AI bill you can't check at all

TokenLedger

See how much of your AI bill is actually checkable — and how much you pay on pure trust. Every figure labelled EXACT, BOUNDED, or UNVERIFIABLE.

PyPI License Python Self-hosted

Website · by Nativerse


Providers self-report token usage and bill you on it. Some of that bill can be checked against what you actually received; most of it — reasoning and cache tokens you never see — cannot be checked by anyone. TokenLedger re-counts the output you received, reconciles it against the reported usage, and above all tells you how much of your bill has any ground truth at all, labelling every figure EXACT, BOUNDED, or UNVERIFIABLE. Self-hosted — no prompt or response content leaves your machine.

What this is, and is not. Re-counting the output is a consistency check that binds the bill to the artifact you received — it catches substitution and metering bugs, but it is not an independent measure of whether your true cost is fair. The genuine value is measuring the unverifiable share. See docs/known-limitations.md.

See it in action

TokenLedger reconciling two live BytePlus video bills

Two real BytePlus (Seedance) video bills, re-derived from the delivered files. Video tokens follow (width × height / 1024) × frames, so re-deriving the count (246,840 and 108,900, gap 0) is a consistency check: it binds the bill to the file you received and would catch a 1080p-billed / 480p-delivered swap, but a matching number is what an honest provider always produces. The figure that should worry you is the unverifiable majority — reasoning and cache — that no method can check. Reproduce: examples/byteplus_validation.py.

What it can and cannot check

Bucket Confidence How
Output tokens EXACT Re-tokenised with the model's real tokenizer (OpenAI via tiktoken; DeepSeek, Qwen, Llama, Mistral, Gemma via their own), pinned by provenance. A gap is a flag to investigate, not a verdict — it can mean over-reporting, the wrong tokenizer, or non-canonical generation.
Input tokens BOUNDED Re-counts what you sent plus documented overhead; flags figures outside a tolerance band. Cannot reconstruct hidden server-side additions.
Reasoning tokens UNVERIFIABLE Billed but never returned to you. Recorded, never asserted. On reasoning models this is most of the bill.
Cache hit/miss UNVERIFIABLE Provider-internal per call. Recorded; verify behaviourally across many calls.
Billing period THREE-WAY Captured per-call usage vs the provider's billing/usage-API total. When a provider's own two numbers disagree, no tokenizer is needed.

Every result carries its confidence label. The tool never claims proof it does not have.

Quick start

pip install "retoken[exact]"          # CLI + tiktoken + tokenizers (exact mode)

retoken demo                          # offline demo: plants discrepancies, catches them
open retoken_demo.html                # the dashboard

Installing without the [exact] extra runs in estimator mode, where exact-only buckets are labelled BOUNDED instead of EXACT. Tokenisation runs locally; only the public tokenizer file is fetched once, and you can bundle it for air-gapped sites.

Sidecar over an existing LiteLLM gateway

LiteLLM already writes spend logs. Point TokenLedger at them and it audits the numbers from the outside — an out-of-band audit layer that does not route or proxy your traffic, so it adds no latency and no point of failure.

retoken ingest litellm_spendlogs.jsonl --format litellm --db tokenledger.db
retoken report --db tokenledger.db --html report.html --md discrepancy.md
open report.html

Set STORE_PROMPTS_IN_SPEND_LOGS=true on LiteLLM so output tokens can be re-counted exactly. Without captured text, output and input are reported as UNVERIFIABLE, never falsely flagged.

Docker

docker compose run --rm retoken demo
docker compose run --rm retoken ingest litellm_spendlogs.jsonl --format litellm
docker compose run --rm retoken report --html report.html

How it compares

  • vs LiteLLM / Helicone — those aggregate the provider's reported usage; a wrong number stays wrong on the dashboard. TokenLedger re-counts the output independently — a consistency check against the artifact you received — and measures the share that cannot be checked at all.
  • vs CoIn (arXiv:2505.13778) — that approach is cooperative, needing the provider to publish commitments and a trusted auditor. TokenLedger is passive: you only need the output you already received.

What it does not claim

It does not claim any provider is overbilling — in the runs above the counts matched exactly. It does not judge whether the unit price is fair. It checks the count, so that you can too, and it is explicit about how much of a modern bill nobody can independently check.

Design principles

  1. No data egress. All counting and reconciliation are local. Content can be stored hashed only (Store(redact=True)).
  2. Honest confidence. Each bucket is EXACT, BOUNDED, or UNVERIFIABLE. No false certainty.
  3. Passive. Logging never changes or breaks the real call; a logging failure is swallowed.
  4. Multi-provider, multi-session, multi-user. Pluggable adapters, every record tagged.
  5. Vendor-neutral. Pricing and tolerances are configuration, not hard-coded assumptions.

Status

The reconciliation engine, store, recorder, dashboard, discrepancy report, pluggable cost model, and connectors for OpenAI, Anthropic, LiteLLM, and Bedrock usage shapes are working and covered by the test suite. Demand and the closed-model band width are being validated with design partners. We do not claim a result we have not measured.

Licence

Apache-2.0. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

retoken-0.1.1.tar.gz (103.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

retoken-0.1.1-py3-none-any.whl (80.6 kB view details)

Uploaded Python 3

File details

Details for the file retoken-0.1.1.tar.gz.

File metadata

  • Download URL: retoken-0.1.1.tar.gz
  • Upload date:
  • Size: 103.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for retoken-0.1.1.tar.gz
Algorithm Hash digest
SHA256 d7f05a71f9fccfec6762934c666d00964c1b991f78c99e31c2690905b084b7e4
MD5 8d510b4c917e365334fad68c65bdd8f2
BLAKE2b-256 774573362ec5fa84c921c30bb452f61257b1f3db69c2572cc56f8ea94efdcfd6

See more details on using hashes here.

File details

Details for the file retoken-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: retoken-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 80.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for retoken-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a20f546d80660075234cdbb9724d2d710fa6f14b5df3c06b5b208cfe353ae494
MD5 068e30f3f15813833d3126d100001f82
BLAKE2b-256 7c278233772454de23fc519649ca906970c430aaf8c09267019d29e93c09647d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page