Track real LLM model usage and compute live gross margin with Tollgate.

These details have not been verified by PyPI

Project links

Homepage

Project description

tollgateai (Python SDK)

Track real LLM model usage and compute live gross margin with Tollgate. The SDK reads the actual usage off each provider response — you never hand-count tokens. Zero dependencies.

Published on PyPI: tollgateai (v0.2.0).

Works with OpenAI, Anthropic, AWS Bedrock, and every OpenAI-compatible gateway (OpenRouter, Groq, Together, Nebius, local vLLM, …) — streaming and non-streaming. Cost is computed server-side from the token counts the wrappers capture, so no provider has to return a dollar figure.

pip install tollgateai

Create an API key in Tollgate → Integrations, then set:

export TOLLGATE_API_KEY=tg_live_xxx
# optional, defaults to the hosted app:
export TOLLGATE_BASE_URL=https://tollgateai.vercel.app

Auto-instrumentation (recommended)

Wrap your provider client once; every call reports real usage in the background.

Anthropic

from anthropic import Anthropic
from tollgate import create_tollgate_client, wrap_anthropic

tollgate = create_tollgate_client()  # reads TOLLGATE_API_KEY

# Pin a run_id so every call in this run is grouped and reports cost only.
run_id = "ticket_8842"
anthropic = wrap_anthropic(
    Anthropic(), tollgate,
    customer_id="cust_A",     # your end customer
    run_id=run_id,
)

# Use the client normally — usage is tracked automatically.
anthropic.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=512,
    messages=[{"role": "user", "content": "Resolve this ticket…"}],
)

# Book revenue once, when the run finishes — "no outcome, no charge".
tollgate.resolve(
    run_id=run_id,
    customer_id="cust_A",
    outcome="resolved",       # "resolved" | "escalated" | "failed"
    revenue_unit_cents=50,    # charge for this resolved unit ($0.50)
)

Outcome-based pricing

Under per-resolution / outcome pricing, only a resolved run earns revenue — an escalated/failed run earns $0 but its provider cost still counts against you. Wrap your client to meter cost on every call, then call resolve() once at the end of the run to book the outcome. For simple per-call billing you can instead pass revenue_unit_cents in the wrap options and skip resolve().

OpenAI

from openai import OpenAI
from tollgate import create_tollgate_client, wrap_openai

tollgate = create_tollgate_client()
openai = wrap_openai(OpenAI(), tollgate, customer_id="cust_A")

openai.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
)

revenue_unit_cents can also be a callable of the response, e.g. revenue_unit_cents=lambda res: 50 if res.something else 0.

OpenAI-compatible gateways

Point the OpenAI SDK at any compatible endpoint and pass provider="openai_compatible":

openai = OpenAI(api_key=GROQ_KEY, base_url="https://api.groq.com/openai/v1")
client = wrap_openai(openai, tollgate, customer_id="cust_A", provider="openai_compatible")
client.chat.completions.create(model="llama-3.3-70b-versatile", messages=[...])

Streaming

Streaming is captured automatically. For OpenAI / compatible, pass stream_options={"include_usage": True} (required for a final usage chunk); Anthropic needs no flag. Iterate the stream as usual — usage is reported when it ends.

AWS Bedrock

Wrap a boto3 bedrock-runtime client so converse / converse_stream auto-report usage (the model id is read from the call):

import boto3
from tollgate import wrap_bedrock

bedrock = wrap_bedrock(boto3.client("bedrock-runtime", region_name="us-east-1"), tollgate, customer_id="cust_A")
bedrock.converse(modelId="anthropic.claude-3-5-sonnet-20241022-v2:0", messages=[...])

Already have an exact cost?

Pass provider_cost_cents (a number or a callable of the response) and the server uses it verbatim, skipping the rate card.

Manual tracking

For full control or unusual providers:

from tollgate import create_tollgate_client

tollgate = create_tollgate_client()

tollgate.track({
    "customerId": "cust_A",
    "runId": "run_12345",
    "provider": "anthropic",
    "model": "claude-sonnet-4-6",
    "tokensIn": 1200,
    "tokensOut": 450,
    "reasoningTokens": 0,
    "cachedTokens": 0,
    "revenueUnitCents": 50,
    "idempotencyKey": "run_12345#step_1",  # exactly-once: safe to retry
})

Notes

Idempotent. Events dedupe on idempotencyKey (auto-set to the provider response id by the wrappers), so retries never double-count.
No prompt content is ever sent — only token counts and metadata.
Streaming is auto-tracked (OpenAI needs stream_options={"include_usage": True}).
Cost from tokens. The server prices every event from token counts × a rate card that auto-syncs daily from the public LiteLLM registry — unknown models are priced at $0 and flagged in logs. See docs/PRICING.md.
Non-blocking. Auto-instrumented tracking runs on a background thread; failures go to on_error (default: log a warning) and never break your call.

wrap_* accepts customer_id, agent_id, run_id, revenue_unit_cents, provider (override; e.g. "openai_compatible"), provider_cost_cents, on_error.

Licensed for use with Tollgate. Not open source.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

1.0.0

Jun 28, 2026

0.9.0

Jun 27, 2026

0.8.0

Jun 26, 2026

0.7.0

Jun 25, 2026

0.6.0

Jun 25, 2026

0.5.0

Jun 25, 2026

0.4.0

Jun 24, 2026

0.3.0

Jun 24, 2026

0.2.1

Jun 24, 2026

This version

0.2.0

Jun 24, 2026

0.1.2

Jun 24, 2026

0.1.1

Jun 21, 2026

0.1.0

Jun 21, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tollgateai-0.2.0.tar.gz (7.7 kB view details)

Uploaded Jun 24, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tollgateai-0.2.0-py3-none-any.whl (9.0 kB view details)

Uploaded Jun 24, 2026 Python 3

File details

Details for the file tollgateai-0.2.0.tar.gz.

File metadata

Download URL: tollgateai-0.2.0.tar.gz
Upload date: Jun 24, 2026
Size: 7.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for tollgateai-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`42a949905cf156debcd581d97f1fc11210177154f25d7745e83522c4cf5bf3f2`
MD5	`ad41c37b807d2e9cab3cfd3d54de7af8`
BLAKE2b-256	`2e9505e345227c2f6b5b59e92f640ad29b0305a6673fd1c86236ebe83c7b004c`

See more details on using hashes here.

File details

Details for the file tollgateai-0.2.0-py3-none-any.whl.

File metadata

Download URL: tollgateai-0.2.0-py3-none-any.whl
Upload date: Jun 24, 2026
Size: 9.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for tollgateai-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0ca458b776269f136ea40f8ad9a9c9a07fd19056a70ce545b4c4632ceb6efbd8`
MD5	`4b0af23e9692e887db7f75efbaa82512`
BLAKE2b-256	`6077f8609e33a1e54aa88f2ed895663979888fb58dbdaa32740d99c8a8999e29`

See more details on using hashes here.

tollgateai 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

tollgateai (Python SDK)

Auto-instrumentation (recommended)

Anthropic

Outcome-based pricing

OpenAI

OpenAI-compatible gateways

Streaming

AWS Bedrock

Already have an exact cost?

Manual tracking

Notes

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes