Skip to main content

AI API gateway SDK with cost tracking, budget enforcement, and multi-provider routing for LLM applications

Project description

LLMKit

Track what your AI agents cost. One line of code.

PyPI Downloads Stars MIT Scorecard


Cost tracking for LLM APIs. Works with OpenAI, Anthropic, Gemini, Groq, Mistral, Together, and any OpenAI-compatible SDK. 730+ models priced. Zero config, zero account needed for local tracking.

pip install llmkit-sdk

Zero-config cost tracking

Wrap any OpenAI-compatible client. Costs are estimated locally from a bundled pricing table - no proxy, no account, no network calls:

from llmkit import tracked
from openai import OpenAI

client = OpenAI(http_client=tracked())
res = client.chat.completions.create(
    model="gpt-4.1",
    messages=[{"role": "user", "content": "explain CQRS"}],
)
# costs calculated automatically from response usage data

Works the same with Anthropic, Gemini, Groq, Mistral, Together, and any OpenAI-compatible SDK:

from anthropic import Anthropic

client = Anthropic(http_client=tracked())
msg = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "explain event sourcing"}],
)

Collect costs

costs = []
client = OpenAI(http_client=tracked(on_cost=costs.append))

# ... run your agent ...

total = sum(c.total_cost for c in costs if c.total_cost)
print(f"Agent run cost: ${total:.4f}")

Estimate from any response

from llmkit import estimate_cost

cost = estimate_cost(response)
print(f"~${cost.total_cost:.6f}")

How it compares

Feature llmkit-sdk tokencost litellm
Zero-config tracking yes (httpx transport) no (manual call) no (callback setup)
Works with existing SDK code yes (drop-in) no (separate function) yes (but requires litellm wrapper)
Local estimation (no proxy) yes yes no
Budget enforcement yes (via proxy) no yes (but 9+ bypass bugs)
Streaming cost tracking yes no yes
Session grouping yes no no
Models priced 730+ 400+ 100+
Install size ~200KB ~50KB ~50MB

Session tracking

Group costs by agent run:

from llmkit import LLMKit

client = LLMKit(api_key="llmk_...")
agent = client.session()

for task in tasks:
    completion, cost = agent.chat(
        model="gpt-4.1",
        messages=[{"role": "user", "content": task}],
    )

print(f"Session: ${agent.stats.total_cost:.4f} across {agent.stats.request_count} requests")

Streaming

stream = client.chat_stream(
    model="claude-sonnet-4-20250514",
    messages=[{"role": "user", "content": "write a haiku"}],
)
for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="")

print(f"\nCost: ${stream.cost.total_cost:.6f}")

Proxy mode (budget enforcement)

Route through the LLMKit proxy for hard budget limits, per-key rate limiting, and dashboard analytics:

client = LLMKit(api_key="llmk_...")
completion, cost = client.chat(
    model="gpt-4.1",
    messages=[{"role": "user", "content": "hello"}],
)
print(f"${cost.total_cost:.4f} via {cost.provider}")

Set a $10 daily budget in the dashboard. When it's hit, requests get a 402 - not a log message, an actual block. No more runaway agents.

Async

from llmkit import AsyncLLMKit

client = AsyncLLMKit(api_key="llmk_...")
completion, cost = await client.chat(
    model="gpt-4.1",
    messages=[{"role": "user", "content": "hello"}],
)

No SDK needed

LLMKit is OpenAI-compatible. Any client works:

from openai import OpenAI

client = OpenAI(
    base_url="https://llmkit-proxy.smigolsmigol.workers.dev/v1",
    api_key="llmk_...",
)

Part of LLMKit

This is the Python SDK for LLMKit, an open-source AI API gateway. The full platform includes:

  • Proxy (Cloudflare Workers) - budget enforcement, cost tracking, provider routing
  • Dashboard (Next.js) - analytics, API key management, budget configuration
  • MCP server - 11 tools for Claude Code, Cursor, and Cline cost tracking
  • TypeScript SDK - same features for Node.js/Deno/Bun
  • CLI - wrap any command with npx @f3d1/llmkit-cli -- node agent.js

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmkit_sdk-0.1.8.tar.gz (31.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llmkit_sdk-0.1.8-py3-none-any.whl (23.4 kB view details)

Uploaded Python 3

File details

Details for the file llmkit_sdk-0.1.8.tar.gz.

File metadata

  • Download URL: llmkit_sdk-0.1.8.tar.gz
  • Upload date:
  • Size: 31.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for llmkit_sdk-0.1.8.tar.gz
Algorithm Hash digest
SHA256 52919bb9bb25c3c6cea711a4c940951a2a733fe79da50265fef3bf57d3bf5ed5
MD5 1e0c6ba3a919bf0294f78f14057947bc
BLAKE2b-256 74290e3a0f85d382e0b4cce3b799d12cab7cb6df482113ddb96250b6763d5094

See more details on using hashes here.

File details

Details for the file llmkit_sdk-0.1.8-py3-none-any.whl.

File metadata

  • Download URL: llmkit_sdk-0.1.8-py3-none-any.whl
  • Upload date:
  • Size: 23.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for llmkit_sdk-0.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 9643fa8a37e37d388ee691005e9e7b3bdfb3f56c27b1ac2acb20fa64be97b4eb
MD5 6a123fe6a03d507f4d6abd9cad05044d
BLAKE2b-256 a53c4e9e42c96b66367e35d541685fab8b82954017521407f784ebcbb1739a02

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page