Skip to main content

AI API gateway SDK with cost tracking, budget enforcement, and multi-provider routing for LLM applications

Project description

LLMKit

Track what your AI agents cost. One line of code.

PyPI Downloads Stars MIT Scorecard


Cost tracking for LLM APIs. Works with OpenAI, Anthropic, Gemini, Groq, Mistral, Together, and any OpenAI-compatible SDK. 730+ models priced. Zero config, zero account needed for local tracking.

pip install llmkit-sdk

Zero-config cost tracking

Wrap any OpenAI-compatible client. Costs are estimated locally from a bundled pricing table - no proxy, no account, no network calls:

from llmkit import tracked
from openai import OpenAI

client = OpenAI(http_client=tracked())
res = client.chat.completions.create(
    model="gpt-4.1",
    messages=[{"role": "user", "content": "explain CQRS"}],
)
# costs calculated automatically from response usage data

Works the same with Anthropic, Gemini, Groq, Mistral, Together, and any OpenAI-compatible SDK:

from anthropic import Anthropic

client = Anthropic(http_client=tracked())
msg = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "explain event sourcing"}],
)

Collect costs

costs = []
client = OpenAI(http_client=tracked(on_cost=costs.append))

# ... run your agent ...

total = sum(c.total_cost for c in costs if c.total_cost)
print(f"Agent run cost: ${total:.4f}")

Estimate from any response

from llmkit import estimate_cost

cost = estimate_cost(response)
print(f"~${cost.total_cost:.6f}")

How it compares

Feature llmkit-sdk tokencost litellm
Zero-config tracking yes (httpx transport) no (manual call) no (callback setup)
Works with existing SDK code yes (drop-in) no (separate function) yes (but requires litellm wrapper)
Local estimation (no proxy) yes yes no
Budget enforcement yes (via proxy) no yes (but 9+ bypass bugs)
Streaming cost tracking yes no yes
Session grouping yes no no
Models priced 730+ 400+ 100+
Install size ~200KB ~50KB ~50MB

Framework integrations

Drop-in cost tracking for popular agent frameworks:

# LangChain
from llmkit.integrations.langchain import LLMKitCallbackHandler
handler = LLMKitCallbackHandler()
chain.invoke("...", config={"callbacks": [handler]})
print(f"${handler.total_cost:.4f}")

# LlamaIndex
from llmkit.integrations.llamaindex import LLMKitCallbackHandler
from llama_index.core import Settings
Settings.callback_manager.add_handler(LLMKitCallbackHandler())

# Pydantic AI
from llmkit.integrations.pydantic_ai import llmkit_hooks
hooks, tracker = llmkit_hooks()
agent = Agent("openai:gpt-4.1", capabilities=[hooks])
result = await agent.run("...")
print(f"${tracker.total_cost:.4f}")

Frameworks are optional dependencies - install only what you use.

Session tracking

Group costs by agent run:

from llmkit import LLMKit

client = LLMKit(api_key="llmk_...")
agent = client.session()

for task in tasks:
    completion, cost = agent.chat(
        model="gpt-4.1",
        messages=[{"role": "user", "content": task}],
    )

print(f"Session: ${agent.stats.total_cost:.4f} across {agent.stats.request_count} requests")

Streaming

stream = client.chat_stream(
    model="claude-sonnet-4-20250514",
    messages=[{"role": "user", "content": "write a haiku"}],
)
for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="")

print(f"\nCost: ${stream.cost.total_cost:.6f}")

Proxy mode (budget enforcement)

Route through the LLMKit proxy for hard budget limits, per-key rate limiting, and dashboard analytics:

client = LLMKit(api_key="llmk_...")
completion, cost = client.chat(
    model="gpt-4.1",
    messages=[{"role": "user", "content": "hello"}],
)
print(f"${cost.total_cost:.4f} via {cost.provider}")

Set a $10 daily budget in the dashboard. When it's hit, requests get a 402 - not a log message, an actual block. No more runaway agents.

Async

from llmkit import AsyncLLMKit

client = AsyncLLMKit(api_key="llmk_...")
completion, cost = await client.chat(
    model="gpt-4.1",
    messages=[{"role": "user", "content": "hello"}],
)

No SDK needed

LLMKit is OpenAI-compatible. Any client works:

from openai import OpenAI

client = OpenAI(
    base_url="https://llmkit-proxy.smigolsmigol.workers.dev/v1",
    api_key="llmk_...",
)

Part of LLMKit

This is the Python SDK for LLMKit, an open-source AI API gateway. The full platform includes:

  • Proxy (Cloudflare Workers) - budget enforcement, cost tracking, provider routing
  • Dashboard (Next.js) - analytics, API key management, budget configuration
  • MCP server - 11 tools for Claude Code, Cursor, and Cline cost tracking
  • TypeScript SDK - same features for Node.js/Deno/Bun
  • CLI - wrap any command with npx @f3d1/llmkit-cli -- node agent.js

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmkit_sdk-0.1.9.tar.gz (32.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llmkit_sdk-0.1.9-py3-none-any.whl (23.7 kB view details)

Uploaded Python 3

File details

Details for the file llmkit_sdk-0.1.9.tar.gz.

File metadata

  • Download URL: llmkit_sdk-0.1.9.tar.gz
  • Upload date:
  • Size: 32.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for llmkit_sdk-0.1.9.tar.gz
Algorithm Hash digest
SHA256 ca01e02f7d6054d3805e4d6cef2fc9f21c68bd625a0f2fbeb1d79d8a671a901e
MD5 43ae58796945123240fdf22b15a74cbc
BLAKE2b-256 fc297942c8cfb55ad700acf6c70359296613c3f300ba46a0df19462ef427a7cf

See more details on using hashes here.

File details

Details for the file llmkit_sdk-0.1.9-py3-none-any.whl.

File metadata

  • Download URL: llmkit_sdk-0.1.9-py3-none-any.whl
  • Upload date:
  • Size: 23.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for llmkit_sdk-0.1.9-py3-none-any.whl
Algorithm Hash digest
SHA256 e95b8725670051099657afa946fd06d0bc59ff7810da888c88a7f622afb7dd3a
MD5 0c97e47f82554264c531a08d63f30eb6
BLAKE2b-256 3e50ec8562d7b0a92a417731eed04976d809b7a7cdf869d6e3610502aaa84be2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page