AI API gateway SDK with cost tracking, budget enforcement, and multi-provider routing for LLM applications
Project description
Track what your AI agents cost. One line of code.
Cost tracking for LLM APIs. Works with OpenAI, Anthropic, Gemini, Groq, Mistral, Together, and any OpenAI-compatible SDK. 730+ models priced. Zero config, zero account needed for local tracking.
pip install llmkit-sdk
Zero-config cost tracking
Wrap any OpenAI-compatible client. Costs are estimated locally from a bundled pricing table - no proxy, no account, no network calls:
from llmkit import tracked
from openai import OpenAI
client = OpenAI(http_client=tracked())
res = client.chat.completions.create(
model="gpt-4.1",
messages=[{"role": "user", "content": "explain CQRS"}],
)
# costs calculated automatically from response usage data
Works the same with Anthropic, Gemini, Groq, Mistral, Together, and any OpenAI-compatible SDK:
from anthropic import Anthropic
client = Anthropic(http_client=tracked())
msg = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": "explain event sourcing"}],
)
Collect costs
costs = []
client = OpenAI(http_client=tracked(on_cost=costs.append))
# ... run your agent ...
total = sum(c.total_cost for c in costs if c.total_cost)
print(f"Agent run cost: ${total:.4f}")
Estimate from any response
from llmkit import estimate_cost
cost = estimate_cost(response)
print(f"~${cost.total_cost:.6f}")
How it compares
| Feature | llmkit-sdk | tokencost | litellm |
|---|---|---|---|
| Zero-config tracking | yes (httpx transport) | no (manual call) | no (callback setup) |
| Works with existing SDK code | yes (drop-in) | no (separate function) | yes (but requires litellm wrapper) |
| Local estimation (no proxy) | yes | yes | no |
| Budget enforcement | yes (via proxy) | no | yes (but 9+ bypass bugs) |
| Streaming cost tracking | yes | no | yes |
| Session grouping | yes | no | no |
| Models priced | 730+ | 400+ | 100+ |
| Install size | ~200KB | ~50KB | ~50MB |
Framework integrations
Drop-in cost tracking for popular agent frameworks:
# LangChain
from llmkit.integrations.langchain import LLMKitCallbackHandler
handler = LLMKitCallbackHandler()
chain.invoke("...", config={"callbacks": [handler]})
print(f"${handler.total_cost:.4f}")
# LlamaIndex
from llmkit.integrations.llamaindex import LLMKitCallbackHandler
from llama_index.core import Settings
Settings.callback_manager.add_handler(LLMKitCallbackHandler())
# Pydantic AI
from llmkit.integrations.pydantic_ai import llmkit_hooks
hooks, tracker = llmkit_hooks()
agent = Agent("openai:gpt-4.1", capabilities=[hooks])
result = await agent.run("...")
print(f"${tracker.total_cost:.4f}")
Frameworks are optional dependencies - install only what you use.
Session tracking
Group costs by agent run:
from llmkit import LLMKit
client = LLMKit(api_key="llmk_...")
agent = client.session()
for task in tasks:
completion, cost = agent.chat(
model="gpt-4.1",
messages=[{"role": "user", "content": task}],
)
print(f"Session: ${agent.stats.total_cost:.4f} across {agent.stats.request_count} requests")
Streaming
stream = client.chat_stream(
model="claude-sonnet-4-20250514",
messages=[{"role": "user", "content": "write a haiku"}],
)
for chunk in stream:
print(chunk.choices[0].delta.content or "", end="")
print(f"\nCost: ${stream.cost.total_cost:.6f}")
Proxy mode (budget enforcement)
Route through the LLMKit proxy for hard budget limits, per-key rate limiting, and dashboard analytics:
client = LLMKit(api_key="llmk_...")
completion, cost = client.chat(
model="gpt-4.1",
messages=[{"role": "user", "content": "hello"}],
)
print(f"${cost.total_cost:.4f} via {cost.provider}")
Set a $10 daily budget in the dashboard. When it's hit, requests get a 402 - not a log message, an actual block. No more runaway agents.
Async
from llmkit import AsyncLLMKit
client = AsyncLLMKit(api_key="llmk_...")
completion, cost = await client.chat(
model="gpt-4.1",
messages=[{"role": "user", "content": "hello"}],
)
No SDK needed
LLMKit is OpenAI-compatible. Any client works:
from openai import OpenAI
client = OpenAI(
base_url="https://llmkit-proxy.smigolsmigol.workers.dev/v1",
api_key="llmk_...",
)
Part of LLMKit
This is the Python SDK for LLMKit, an open-source AI API gateway. The full platform includes:
- Proxy (Cloudflare Workers) - budget enforcement, cost tracking, provider routing
- Dashboard (Next.js) - analytics, API key management, budget configuration
- MCP server - 11 tools for Claude Code, Cursor, and Cline cost tracking
- TypeScript SDK - same features for Node.js/Deno/Bun
- CLI - wrap any command with
npx @f3d1/llmkit-cli -- node agent.js
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llmkit_sdk-0.1.9.tar.gz.
File metadata
- Download URL: llmkit_sdk-0.1.9.tar.gz
- Upload date:
- Size: 32.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ca01e02f7d6054d3805e4d6cef2fc9f21c68bd625a0f2fbeb1d79d8a671a901e
|
|
| MD5 |
43ae58796945123240fdf22b15a74cbc
|
|
| BLAKE2b-256 |
fc297942c8cfb55ad700acf6c70359296613c3f300ba46a0df19462ef427a7cf
|
File details
Details for the file llmkit_sdk-0.1.9-py3-none-any.whl.
File metadata
- Download URL: llmkit_sdk-0.1.9-py3-none-any.whl
- Upload date:
- Size: 23.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e95b8725670051099657afa946fd06d0bc59ff7810da888c88a7f622afb7dd3a
|
|
| MD5 |
0c97e47f82554264c531a08d63f30eb6
|
|
| BLAKE2b-256 |
3e50ec8562d7b0a92a417731eed04976d809b7a7cdf869d6e3610502aaa84be2
|