Track net margin on every LLM API call

These details have not been verified by PyPI

Project description

LLMBillingKit

LLMBillingKit helps you measure real profit per LLM call with one line of code and no external infrastructure.

The problem this solves

When you charge end users for AI features, your real margin can drift quickly because token pricing changes often, model mix shifts over time, and provider-specific pricing rules are easy to miss. Most teams either track only revenue or build ad-hoc spreadsheets that do not stay accurate.

LLMBillingKit gives you a local, auditable ledger of what you charged and what each request likely cost, so you can answer: "Are we making money on this feature?" in seconds.

Why local SQLite and zero infrastructure are deliberate

Your usage and customer billing telemetry stays on your machine.
No hosted service to provision, secure, or pay for.
No extra API keys, webhooks, queues, or background workers.
Works offline for local development and incident analysis.

This is a deliberate trade-off: LLMBillingKit is designed to be a lightweight embedded accounting layer, not a hosted analytics platform.

Install

pip install llmbillingkit

Quick usage

from openai import OpenAI
from LLMBillingKit import track

client = OpenAI()
response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Hello!"}],
)

event = track(response, charged=0.05, customer="user_123")
print(event)

track() works with OpenAI-compatible responses and Anthropic-style usage fields (input_tokens / output_tokens).

How it works

track(response, charged, customer) extracts model, token usage, and request_id from the response object.
It looks up per-token pricing from bundled costs.json.
It computes:

$$ \mathrm{actual_cost} = (\mathrm{input_tokens} \times \mathrm{input_price}) + (\mathrm{output_tokens} \times \mathrm{output_price}) $$

$$ \mathrm{margin} = \mathrm{charged} - \mathrm{actual_cost} $$

It stores the event in local SQLite (~/.LLMBillingKit/usage.db).
The CLI reads this table to generate reporting and exports.

If a model is unknown in the pricing table, track() returns None rather than crashing your app. Pass raise_errors=True to get an explicit TrackingError describing what went wrong:

from LLMBillingKit import TrackingError, track

try:
    track(response, charged=0.01, customer="user_123", raise_errors=True)
except TrackingError as e:
    print(f"could not track: {e}")

A small allowlist of dated provider snapshots that are verified to share pricing with a base model (for example gpt-4o-mini-2024-07-18 → gpt-4o-mini) is normalized automatically. Anthropic canonical IDs already include a date in their pricing key (claude-3-5-sonnet-20241022) and are matched as-is. Other dated snapshots — including OpenAI ones priced differently from their alias (e.g. gpt-4o-2024-05-13) — must be added to costs.json with their own prices; they will not be silently collapsed onto another model's rates.

CLI commands and sample output

`llmbilling report`

$ llmbilling report
Customer      Calls    Charged      Cost    Margin
----------  -------  ---------  --------  --------
acme_corp        84  $4.200000  $0.620000  $3.580000
pro_tier         47  $2.350000  $0.190000  $2.160000
trial_user       23  $0.460000  $0.310000  $0.150000
user_free       312  $0.000000  $1.870000  $-1.870000

`llmbilling models`

$ llmbilling models
Model                          Calls    Charged      Cost    Margin
---------------------------  -------  ---------  --------  --------
gpt-4o                           112  $0.560000  $0.094000  $0.466000
gpt-4o-mini                      289  $0.140000  $0.003000  $0.137000
claude-3-5-sonnet-20241022        65  $0.300000  $0.120000  $0.180000

`llmbilling export --format csv`

$ llmbilling export --format csv
request_id,timestamp,customer,model,input_tokens,output_tokens,actual_cost,charged,margin
chatcmpl-abc,2026-03-25T14:32:10+00:00,acme_corp,gpt-4o,450,120,0.00213,0.05,0.04787
msg-xyz,2026-03-25T14:33:50+00:00,pro_tier,claude-3-5-sonnet-20241022,200,80,0.00105,0.02,0.01895

`llmbilling export --format json`

$ llmbilling export --format json
[
    {
        "request_id": "chatcmpl-abc",
        "timestamp": "2026-03-25T14:32:10+00:00",
        "customer": "acme_corp",
        "model": "gpt-4o",
        "input_tokens": 450,
        "output_tokens": 120,
        "actual_cost": 0.00213,
        "charged": 0.05,
        "margin": 0.04787
    }
]

`llmbilling add`

Record a usage event without writing Python — useful for backfills, manual corrections, or providers that do not return a structured response.

$ llmbilling add \
    --customer acme \
    --model gpt-4o-mini \
    --input-tokens 8 \
    --output-tokens 9 \
    --charged 0.10
Added event:
  request_id: 6a4f...
  customer:   acme
  model:      gpt-4o-mini
  tokens:     in=8 out=9
  charged:    $0.100000
  cost:       $0.000007
  margin:     $0.099993

`llmbilling update`

Edit the customer or charged amount on an existing record. Margin is recomputed automatically when --charged changes.

llmbilling update --request-id chatcmpl-abc --charged 0.25
llmbilling update --request-id chatcmpl-abc --customer acme-enterprise

Bulk-create with `--calls`

llmbilling add --calls N records N equivalent usage events in one go (each gets its own UUID). Useful for testing, demos, or backfilling fixed-shape traffic.

$ llmbilling add \
    --customer Walmart \
    --model gpt-4o-mini \
    --input-tokens 100 \
    --output-tokens 150 \
    --charged 0.15 \
    --calls 10
Added 10 events for customer 'Walmart':
  model:      gpt-4o-mini
  tokens:     in=100 out=150
  per-call:   charged $0.150000 | cost $0.000105 | margin $0.149895
  totals:     charged $1.500000 | cost $0.001050 | margin $1.498950

--request-id cannot be combined with --calls > 1 (each event needs a unique ID).

`llmbilling customer set-calls`

Set how many usage events a customer has, increasing or decreasing the count to a target number.

llmbilling customer set-calls --customer Walmart --calls 10
llmbilling customer set-calls --customer Walmart --calls 1 --yes

Behavior:

Increase clones the customer's existing event shape using fresh UUIDs and current timestamps.
Decrease deletes the most recent matching events, preserving the oldest history. Requires --yes to skip the confirmation prompt.
If the customer has events of multiple shapes (different model / tokens / charged combinations), pass --model, --input-tokens, --output-tokens, and --charged together to disambiguate which shape to adjust.
A brand-new customer can be created by providing the full shape filter.

CLI reference

Command	Description
`llmbilling report`	Margin breakdown by customer
`llmbilling report --days 7`	Filter to the last 7 days
`llmbilling report --model gpt-4o`	Filter by model
`llmbilling models`	Margin breakdown by model
`llmbilling models --days 30`	Model report for the last 30 days
`llmbilling export`	Export raw events as CSV
`llmbilling export --format json`	Export raw events as JSON
`llmbilling add --customer <name> --model <model> --input-tokens <n> --output-tokens <n> --charged <amount>`	Record a usage event from the CLI
`llmbilling add ... --calls <N>`	Record N equivalent events in one command
`llmbilling update --request-id <id> --charged <amount>`	Update the charged amount on an existing event (recomputes margin)
`llmbilling update --request-id <id> --customer <customer>`	Reassign an event to a different customer
`llmbilling customer set-calls --customer <name> --calls <N>`	Adjust a customer's event count up or down

Supported models

Pricing data lives in LLMBillingKit/costs.json and is verified in PRICING_VERIFICATION.md.

Current table includes representative models from:

OpenAI (for example gpt-4o, gpt-4o-mini, o3-mini)
Anthropic (for example claude-sonnet-4-20250514, claude-3-5-haiku-20241022)
Google (for example gemini-2.5-pro, gemini-2.5-flash)
Mistral (for example mistral-large-latest)

Limitations

Cost accuracy depends on the bundled static pricing table and how quickly it is updated.
Some providers have pricing nuances (for example reasoning tokens or tier-based rates) that may not be fully modeled.
Unknown models return None from track() until their pricing is added.
SQLite is local-first by design, so there is no built-in multi-host sync dashboard.
test_e2e.py requires a real provider API key and network access.

Roadmap

Faster pricing table update process and validation automation.
Optional backends beyond local SQLite for teams that need centralized storage.
More built-in analytics views (cohort, endpoint, and trend reporting).
Better tooling around provider-specific pricing edge cases.

Examples

See examples/ for runnable scripts that demonstrate tracking and reporting patterns.

Contributing

Contributions are welcome. Start with CONTRIBUTING.md for setup, test, and PR guidance.

Code of conduct

This project follows the Contributor Covenant. See CODE_OF_CONDUCT.md.

Changelog

Release notes and version history are in CHANGELOG.md.

License

MIT

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.2

May 11, 2026

0.1.1

May 10, 2026

0.1.0

Mar 25, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmbillingkit-0.1.2.tar.gz (18.4 kB view details)

Uploaded May 11, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llmbillingkit-0.1.2-py3-none-any.whl (22.0 kB view details)

Uploaded May 11, 2026 Python 3

File details

Details for the file llmbillingkit-0.1.2.tar.gz.

File metadata

Download URL: llmbillingkit-0.1.2.tar.gz
Upload date: May 11, 2026
Size: 18.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for llmbillingkit-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`5b6a690d20f9cc0ff8671cd78b3d218eb6fed6451ee76a3271766ca375c43b6b`
MD5	`c6296ef73c6e85e0d7ca35124ccaa887`
BLAKE2b-256	`dc6462495ed1edd66d1243f0f554a0a9dbfa3352af58ed6af54663cf7cff06af`

See more details on using hashes here.

Provenance

The following attestation bundles were made for llmbillingkit-0.1.2.tar.gz:

Publisher: publish.yml on dphan8/LLMBillingKit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: llmbillingkit-0.1.2.tar.gz
- Subject digest: 5b6a690d20f9cc0ff8671cd78b3d218eb6fed6451ee76a3271766ca375c43b6b
- Sigstore transparency entry: 1511717158
- Sigstore integration time: May 11, 2026
Source repository:
- Permalink: dphan8/LLMBillingKit@5d22db4d761fac259e21c454079843ccc5cde462
- Branch / Tag: refs/tags/v0.1.2
- Owner: https://github.com/dphan8
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@5d22db4d761fac259e21c454079843ccc5cde462
- Trigger Event: release

File details

Details for the file llmbillingkit-0.1.2-py3-none-any.whl.

File metadata

Download URL: llmbillingkit-0.1.2-py3-none-any.whl
Upload date: May 11, 2026
Size: 22.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for llmbillingkit-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`78e41be00c7c1189b43f0f479bd8c8d3e11d1b1ba5c75850b06bbd7c941f1b0c`
MD5	`3b977f1c55ebd07fd49f304108e813e5`
BLAKE2b-256	`e4a56d845a5a3d58b9a85db2f7c979ab82596f8a7dbf40cbb9dd9ae1a3447587`

See more details on using hashes here.

Provenance

The following attestation bundles were made for llmbillingkit-0.1.2-py3-none-any.whl:

Publisher: publish.yml on dphan8/LLMBillingKit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: llmbillingkit-0.1.2-py3-none-any.whl
- Subject digest: 78e41be00c7c1189b43f0f479bd8c8d3e11d1b1ba5c75850b06bbd7c941f1b0c
- Sigstore transparency entry: 1511717358
- Sigstore integration time: May 11, 2026
Source repository:
- Permalink: dphan8/LLMBillingKit@5d22db4d761fac259e21c454079843ccc5cde462
- Branch / Tag: refs/tags/v0.1.2
- Owner: https://github.com/dphan8
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@5d22db4d761fac259e21c454079843ccc5cde462
- Trigger Event: release

LLMBillingKit 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

LLMBillingKit

The problem this solves

Why local SQLite and zero infrastructure are deliberate

Install

Quick usage

How it works

CLI commands and sample output

llmbilling report

llmbilling models

llmbilling export --format csv

llmbilling export --format json

llmbilling add

llmbilling update

Bulk-create with --calls

llmbilling customer set-calls

CLI reference

Supported models

Limitations

Roadmap

Examples

Contributing

Code of conduct

Changelog

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`llmbilling report`

`llmbilling models`

`llmbilling export --format csv`

`llmbilling export --format json`

`llmbilling add`

`llmbilling update`

Bulk-create with `--calls`

`llmbilling customer set-calls`