Self-hosted, cross-provider LLM token metering reconciliation — independently verify what providers bill you and find where to migrate.

Project description

TokenLedger

See how much of your AI bill is actually checkable — and how much you pay on pure trust. Every figure labelled EXACT, BOUNDED, or UNVERIFIABLE.

Website · by Nativerse

Providers self-report token usage and bill you on it. Some of that bill can be checked against what you actually received; most of it — reasoning and cache tokens you never see — cannot be checked by anyone. TokenLedger re-counts the output you received, reconciles it against the reported usage, and above all tells you how much of your bill has any ground truth at all, labelling every figure EXACT, BOUNDED, or UNVERIFIABLE. Self-hosted — no prompt or response content leaves your machine.

What this is, and is not. Re-counting the output is a consistency check that binds the bill to the artifact you received — it catches substitution and metering bugs, but it is not an independent measure of whether your true cost is fair. The genuine value is measuring the unverifiable share. See docs/known-limitations.md.

See it in action

TokenLedger reconciling two live BytePlus video bills

Two real BytePlus (Seedance) video bills, re-derived from the delivered files. Video tokens follow (width × height / 1024) × frames, so re-deriving the count (246,840 and 108,900, gap 0) is a consistency check: it binds the bill to the file you received and would catch a 1080p-billed / 480p-delivered swap, but a matching number is what an honest provider always produces. The figure that should worry you is the unverifiable majority — reasoning and cache — that no method can check. Reproduce: examples/byteplus_validation.py.

What it can and cannot check

Bucket	Confidence	How
Output tokens	EXACT	Re-tokenised with the model's real tokenizer (OpenAI via tiktoken; DeepSeek, Qwen, Llama, Mistral, Gemma via their own), pinned by provenance. A gap is a flag to investigate, not a verdict — it can mean over-reporting, the wrong tokenizer, or non-canonical generation.
Input tokens	BOUNDED	Re-counts what you sent plus documented overhead; flags figures outside a tolerance band. Cannot reconstruct hidden server-side additions.
Reasoning tokens	UNVERIFIABLE	Billed but never returned to you. Recorded, never asserted. On reasoning models this is most of the bill.
Cache hit/miss	UNVERIFIABLE	Provider-internal per call. Recorded; verify behaviourally across many calls.
Billing period	THREE-WAY	Captured per-call usage vs the provider's billing/usage-API total. When a provider's own two numbers disagree, no tokenizer is needed.

Every result carries its confidence label. The tool never claims proof it does not have.

Quick start

pip install "retoken[exact]"          # CLI + tiktoken + tokenizers (exact mode)

retoken demo                          # offline demo: plants discrepancies, catches them
open retoken_demo.html                # the dashboard

Installing without the [exact] extra runs in estimator mode, where exact-only buckets are labelled BOUNDED instead of EXACT. Tokenisation runs locally; only the public tokenizer file is fetched once, and you can bundle it for air-gapped sites.

Sidecar over an existing LiteLLM gateway

LiteLLM already writes spend logs. Point TokenLedger at them and it audits the numbers from the outside — an out-of-band audit layer that does not route or proxy your traffic, so it adds no latency and no point of failure.

retoken ingest litellm_spendlogs.jsonl --format litellm --db tokenledger.db
retoken report --db tokenledger.db --html report.html --md discrepancy.md
open report.html

Set STORE_PROMPTS_IN_SPEND_LOGS=true on LiteLLM so output tokens can be re-counted exactly. Without captured text, output and input are reported as UNVERIFIABLE, never falsely flagged.

Docker

docker compose run --rm retoken demo
docker compose run --rm retoken ingest litellm_spendlogs.jsonl --format litellm
docker compose run --rm retoken report --html report.html

How it compares

vs LiteLLM / Helicone — those aggregate the provider's reported usage; a wrong number stays wrong on the dashboard. TokenLedger re-counts the output independently — a consistency check against the artifact you received — and measures the share that cannot be checked at all.
vs CoIn (arXiv:2505.13778) — that approach is cooperative, needing the provider to publish commitments and a trusted auditor. TokenLedger is passive: you only need the output you already received.

What it does not claim

It does not claim any provider is overbilling — in the runs above the counts matched exactly. It does not judge whether the unit price is fair. It checks the count, so that you can too, and it is explicit about how much of a modern bill nobody can independently check.

Design principles

No data egress. All counting and reconciliation are local. Content can be stored hashed only (Store(redact=True)).
Honest confidence. Each bucket is EXACT, BOUNDED, or UNVERIFIABLE. No false certainty.
Passive. Logging never changes or breaks the real call; a logging failure is swallowed.
Multi-provider, multi-session, multi-user. Pluggable adapters, every record tagged.
Vendor-neutral. Pricing and tolerances are configuration, not hard-coded assumptions.

Status

The reconciliation engine, store, recorder, dashboard, discrepancy report, pluggable cost model, and connectors for OpenAI, Anthropic, LiteLLM, and Bedrock usage shapes are working and covered by the test suite. Demand and the closed-model band width are being validated with design partners. We do not claim a result we have not measured.

Licence

Apache-2.0. See LICENSE.

Project details

Release history Release notifications | RSS feed

This version

0.1.1

Jun 24, 2026

0.1.0

Jun 24, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

retoken-0.1.1.tar.gz (103.3 kB view details)

Uploaded Jun 24, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

retoken-0.1.1-py3-none-any.whl (80.6 kB view details)

Uploaded Jun 24, 2026 Python 3

File details

Details for the file retoken-0.1.1.tar.gz.

File metadata

Download URL: retoken-0.1.1.tar.gz
Upload date: Jun 24, 2026
Size: 103.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for retoken-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`d7f05a71f9fccfec6762934c666d00964c1b991f78c99e31c2690905b084b7e4`
MD5	`8d510b4c917e365334fad68c65bdd8f2`
BLAKE2b-256	`774573362ec5fa84c921c30bb452f61257b1f3db69c2572cc56f8ea94efdcfd6`

See more details on using hashes here.

File details

Details for the file retoken-0.1.1-py3-none-any.whl.

File metadata

Download URL: retoken-0.1.1-py3-none-any.whl
Upload date: Jun 24, 2026
Size: 80.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for retoken-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a20f546d80660075234cdbb9724d2d710fa6f14b5df3c06b5b208cfe353ae494`
MD5	`068e30f3f15813833d3126d100001f82`
BLAKE2b-256	`7c278233772454de23fc519649ca906970c430aaf8c09267019d29e93c09647d`

See more details on using hashes here.

retoken 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

TokenLedger

See it in action

What it can and cannot check

Quick start

Sidecar over an existing LiteLLM gateway

Docker

How it compares

What it does not claim

Design principles

Status

Licence

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes