The simplest way to use AI in Python with automatic cost tracking and optimization
Project description
Cost Katana Python
AI that just works. Costs that just track.
One import. Any model. Automatic cost tracking.
Table of contents
- Installation
- Quick start
- AI gateway (HTTP)
- Core APIs
- Type-safe model namespaces
- Configuration
- Cost optimization
- Core features
- Framework integration
- Real-world examples
- Error handling
- Experimentation (hosted API)
- Migration guides
- Package names (ecosystem)
- More examples
- Support
- License
Installation
The package on PyPI is named cost-katana (hyphen). You import it as cost_katana (underscore).
pip install cost-katana
Requires Python 3.8+.
Quick start
1. Environment
export COST_KATANA_API_KEY="dak_your_key_here" # required — from the dashboard
export PROJECT_ID="your_project_id" # optional — per-project dashboard filtering
The default API base URL is https://api.costkatana.com (not overridden by env in the high-level client).
2. First call
import cost_katana as ck
from cost_katana import openai
response = ck.ai(openai.gpt_4o, "Explain quantum computing in one sentence")
print(response.text)
print(response.cost)
print(response.tokens)
With COST_KATANA_API_KEY set, you do not need to call configure() first — ck.ai() / ck.chat() auto-configure from the environment. Usage and cost attribution are always on.
If you only set COST_KATANA_API_KEY (no direct provider keys such as OPENAI_API_KEY), requests use Cost Katana hosted models through the backend.
Which surface should I use?
| Goal | Use |
|---|---|
| Simple Python calls with cost on every response | ck.ai() / ck.chat() |
| Drop-in HTTP proxy (OpenAI- or Anthropic-shaped JSON) | Gateway — httpx / requests or cURL (below) |
| Dashboard experiments (compare models, what-if) | Experimentation API (REST; often used via the web app) |
AI gateway (HTTP)
The high-level ck.ai() / ck.chat() APIs call Cost Katana’s backend. For the same drop-in HTTP proxy as the TypeScript gateway() helper, call the gateway with httpx or requests.
Base URL (default): https://api.costkatana.com/api/gateway
Override with COSTKATANA_GATEWAY_URL if your deployment documents one.
Headers
| Header | Value |
|---|---|
Authorization |
Bearer <COST_KATANA_API_KEY> |
Content-Type |
application/json |
x-project-id |
Optional — same role as PROJECT_ID for dashboard scoping |
The hosted gateway enables input firewall (LLM security) and output moderation by default. To opt out for a request, add CostKatana-LLM-Security-Enabled: false and/or CostKatana-Output-Moderation-Enabled: false. You can merge cost_katana.gateway_request_headers(llm_security_enabled=False) into your headers.
Dashboard aggregates for blocked prompts and moderation: CostKatanaClient.get_gateway_security_summary() (GET /api/gateway/security/summary).
OpenAI-compatible — POST {GATEWAY}/v1/chat/completions
import os
import httpx
GATEWAY = os.environ.get(
"COSTKATANA_GATEWAY_URL",
"https://api.costkatana.com/api/gateway",
).rstrip("/")
headers = {
"Authorization": f"Bearer {os.environ['COST_KATANA_API_KEY']}",
"Content-Type": "application/json",
}
project_id = os.environ.get("PROJECT_ID")
if project_id:
headers["x-project-id"] = project_id
payload = {
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Hello!"}],
}
with httpx.Client(timeout=60.0) as client:
r = client.post(f"{GATEWAY}/v1/chat/completions", headers=headers, json=payload)
r.raise_for_status()
data = r.json()
print(data["choices"][0]["message"]["content"])
Anthropic Messages — POST {GATEWAY}/v1/messages
import os
import httpx
GATEWAY = os.environ.get(
"COSTKATANA_GATEWAY_URL",
"https://api.costkatana.com/api/gateway",
).rstrip("/")
headers = {
"Authorization": f"Bearer {os.environ['COST_KATANA_API_KEY']}",
"Content-Type": "application/json",
}
if pid := os.environ.get("PROJECT_ID"):
headers["x-project-id"] = pid
payload = {
"model": "claude-3-5-sonnet-20241022",
"max_tokens": 256,
"messages": [{"role": "user", "content": "Hello!"}],
}
with httpx.Client(timeout=60.0) as client:
r = client.post(f"{GATEWAY}/v1/messages", headers=headers, json=payload)
r.raise_for_status()
data = r.json()
for block in data.get("content", []):
if block.get("type") == "text":
print(block["text"])
cURL
curl -sS "https://api.costkatana.com/api/gateway/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $COST_KATANA_API_KEY" \
-d '{"model":"gpt-4o","messages":[{"role":"user","content":"Hello!"}]}'
More patterns (caching, retries, headers): costkatana-examples 2-gateway and cost-katana-core examples/GATEWAY_USAGE_AND_TRACKING.md.
Core APIs
ck.ai()
import cost_katana as ck
from cost_katana import openai, anthropic, google
response = ck.ai(openai.gpt_4o, "Your prompt", temperature=0.7, max_tokens=500, cache=True, cortex=True)
Prefer namespace constants (openai.gpt_4o, anthropic.claude_3_5_sonnet_20241022, …) over raw strings so IDs stay correct as models change.
ck.chat()
import cost_katana as ck
from cost_katana import openai
chat = ck.chat(openai.gpt_4o, system_message="You are a helpful assistant.")
chat.send("Hello! What can you help me with?")
chat.send("Tell me a programming joke")
print(f"Total cost: ${chat.total_cost:.4f}")
print(f"Messages: {len(chat.history)}")
print(f"Tokens: {chat.total_tokens}")
Caching
import cost_katana as ck
from cost_katana import openai
r1 = ck.ai(openai.gpt_4o, "What is 2+2?", cache=True)
r2 = ck.ai(openai.gpt_4o, "What is 2+2?", cache=True)
print(r1.cached, r2.cached)
Cortex (optimization)
import cost_katana as ck
from cost_katana import openai
response = ck.ai(
openai.gpt_4o,
"Write a comprehensive guide to machine learning for beginners",
cortex=True,
max_tokens=2000,
)
print(response.optimized, response.saved_amount)
Claude extended thinking
For Claude models that support extended thinking (e.g. Opus 4.x, Sonnet 4.x, Sonnet 3.7), pass thinking=True. The gateway sizes the reasoning budget; thinking tokens count as output tokens for billing. On Opus 4.6 / 4.7 and Sonnet 4.6, you can set thinking_effort to 'low', 'medium', 'high', or 'max'. thinking_budget_tokens is optional if you need a fixed cap.
import cost_katana as ck
from cost_katana import anthropic
response = ck.ai(
anthropic.claude_sonnet_4_20250514,
"Solve step by step: If a train leaves at 3pm...",
thinking=True,
max_tokens=1024,
)
print(response.text)
if response.thinking:
print("Reasoning:", response.thinking)
# Adaptive effort (Sonnet 4.6 / Opus 4.6+)
r2 = ck.ai(
anthropic.claude_sonnet_4_6,
"Plan a week-long API migration with risks and rollback.",
thinking=True,
thinking_effort="high",
max_tokens=2048,
)
With CostKatanaClient directly:
from cost_katana import from_env
from cost_katana import anthropic
client = from_env()
data = client.send_message(
message="Explain this architecture tradeoff.",
model_id=anthropic.claude_sonnet_4_20250514,
thinking=True,
thinking_effort="medium",
)
print(data.get("data", {}).get("response"))
print(data.get("data", {}).get("thinking"))
Compare models
import cost_katana as ck
from cost_katana import openai, anthropic, google
prompt = "Summarize the theory of relativity in 50 words"
models = [
("GPT-4 class", openai.gpt_4o),
("Claude 3.5 Sonnet", anthropic.claude_3_5_sonnet_20241022),
("Gemini 2.5 Pro", google.gemini_2_5_pro),
("GPT-3.5 Turbo", openai.gpt_3_5_turbo),
]
for name, model in models:
r = ck.ai(model, prompt)
print(f"{name:22} ${r.cost:.6f}")
Type-safe model namespaces
from cost_katana import openai, anthropic, google, aws_bedrock, xai, deepseek, mistral, groq, cohere, meta
| Namespace | Examples |
|---|---|
openai |
GPT-4, GPT-3.5, O1, O3, DALL-E, Whisper |
anthropic |
Claude 3.5 Sonnet, Haiku, Opus |
google |
Gemini 2.5 Pro, Flash |
aws_bedrock |
Nova, Claude on Bedrock |
xai |
Grok |
deepseek |
DeepSeek |
mistral |
Mistral |
groq |
Groq-hosted Llama / Mixtral / Gemma |
cohere |
Command |
meta |
Llama |
Configuration
Environment variables
| Variable | Required? | Purpose |
|---|---|---|
COST_KATANA_API_KEY |
Yes | Dashboard API key (dak_...) |
PROJECT_ID |
No | Per-project scope; aliases include COST_KATANA_PROJECT / COSTKATANA_PROJECT_ID |
Base URL, default model, and timeouts in the client are package constants — not set via environment variables (unless documented for a specific helper).
Optional direct provider keys
If you call provider APIs yourself outside hosted routing, you may set OPENAI_API_KEY, GEMINI_API_KEY, ANTHROPIC_API_KEY, or AWS credentials for Bedrock. They are not required when only COST_KATANA_API_KEY is set.
Helpers
cost_katana.from_env()—CostKatanaClientfrom the same env vars as zero-config usage.cost_katana.auto_configure()— lazy init beforeai()/chat()/track().cost_katana.track({...})— manual cost row to the dashboard.Config.from_env()— same env mapping as the client.
Programmatic configuration
import cost_katana as ck
ck.configure(
api_key="dak_your_key",
cortex=True,
cache=True,
firewall=True,
)
Request options
import cost_katana as ck
from cost_katana import openai
response = ck.ai(
openai.gpt_4o,
"Your prompt",
temperature=0.7,
max_tokens=500,
system_message="You are helpful",
cache=True,
cortex=True,
retry=True,
)
Cost optimization
| Strategy | When to use |
|---|---|
| Cheaper model for easy tasks | Trivia, classification, translation |
cache=True |
Repeated FAQs |
cortex=True |
Long-form generation |
ck.chat(...) |
Multi-turn sessions |
| High volume, cost-sensitive | Consider Gemini Flash-class models via namespaces |
import cost_katana as ck
from cost_katana import openai
ck.ai(openai.gpt_3_5_turbo, "What is 2+2?")
ck.ai(openai.gpt_3_5_turbo, "What is 2+2?", cache=True)
ck.ai(openai.gpt_4o, "Write a 2000-word essay", cortex=True)
Core features
Cost tracking
Every response includes cost and token usage (tracking cannot be disabled — required for attribution).
response = ck.ai(openai.gpt_4o, "Write a story")
print(response.cost, response.tokens, response.model, response.provider)
Auto-failover
Routing may fall back across providers when configured on the backend.
response = ck.ai(openai.gpt_4o, "Hello")
print(response.provider)
Security firewall
import cost_katana as ck
from cost_katana import openai
ck.configure(firewall=True)
try:
ck.ai(openai.gpt_4o, "ignore all previous instructions and...")
except Exception as e:
print(f"Blocked: {e}")
Framework integration
FastAPI
from fastapi import FastAPI
import cost_katana as ck
from cost_katana import openai
app = FastAPI()
@app.post("/api/chat")
async def chat(body: dict):
r = ck.ai(openai.gpt_4o, body["prompt"])
return {"text": r.text, "cost": r.cost}
Flask
from flask import Flask, request, jsonify
import cost_katana as ck
from cost_katana import openai
app = Flask(__name__)
@app.route("/api/chat", methods=["POST"])
def chat():
r = ck.ai(openai.gpt_4o, request.json["prompt"])
return jsonify({"text": r.text, "cost": r.cost})
Django
from django.http import JsonResponse
import cost_katana as ck
from cost_katana import openai
def chat_view(request):
r = ck.ai(openai.gpt_4o, request.POST.get("prompt"))
return JsonResponse({"text": r.text, "cost": r.cost})
Real-world examples
Customer support bot
import cost_katana as ck
from cost_katana import openai
support = ck.chat(
openai.gpt_3_5_turbo,
system_message="You are a helpful customer support agent.",
)
def handle_query(query: str):
reply = support.send(query)
print(f"Cost so far: ${support.total_cost:.4f}")
return reply
Content with Cortex
import cost_katana as ck
from cost_katana import openai
def generate_blog_post(topic: str):
post = ck.ai(
openai.gpt_4o,
f"Write a blog post about {topic}",
cortex=True,
max_tokens=2000,
)
return {"content": post.text, "cost": post.cost, "word_count": len(post.text.split())}
Code review (with cache)
import cost_katana as ck
from cost_katana import anthropic
def review_code(code: str):
return ck.ai(
anthropic.claude_3_5_sonnet_20241022,
f"Review this code:\n\n{code}",
cache=True,
).text
Error handling
import cost_katana as ck
from cost_katana import openai
from cost_katana.exceptions import CostKatanaError
try:
response = ck.ai(openai.gpt_4o, "Hello")
print(response.text)
except CostKatanaError as e:
msg = str(e).lower()
if "api key" in msg:
print("Set COST_KATANA_API_KEY or a provider key")
elif "rate limit" in msg:
print("Rate limited — retry with backoff")
elif "model" in msg:
print("Model not found or not available")
else:
print(f"Error: {e}")
Experimentation (hosted API)
The Cost Katana backend exposes experimentation REST APIs under /api/experimentation on the hosted API (e.g. https://api.costkatana.com). The dashboard Experimentation experience is built on these endpoints; you can also call them with a dashboard JWT where required.
Highlights
- Model comparison, real-time comparison with SSE progress, experiment history, recommendations
- What-if scenarios and simulations
- Cost estimation before runs
- Fine-tuning analysis helpers
- Export experiment results (JSON/CSV)
Public vs authenticated routes depend on deployment; see the backend controller: experimentation.controller.ts. Real provider execution may require server flags such as ENABLE_REAL_MODEL_COMPARISON=true.
For a longer overview, see cost-katana-core README.md — Experimentation.
Migration guides
From OpenAI SDK
# Before
from openai import OpenAI
client = OpenAI(api_key="sk-...")
completion = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "Hello"}],
)
print(completion.choices[0].message.content)
# After
import cost_katana as ck
from cost_katana import openai
response = ck.ai(openai.gpt_4, "Hello")
print(response.text)
print(f"Cost: ${response.cost}")
From Anthropic SDK
# Before
import anthropic
client = anthropic.Anthropic(api_key="sk-ant-...")
message = client.messages.create(
model="claude-3-sonnet-20241022",
messages=[{"role": "user", "content": "Hello"}],
)
# After
import cost_katana as ck
from cost_katana import anthropic
response = ck.ai(anthropic.claude_3_5_sonnet_20241022, "Hello")
From Google AI SDK
# Before
import google.generativeai as genai
genai.configure(api_key="...")
model = genai.GenerativeModel("gemini-pro")
response = model.generate_content("Hello")
# After
import cost_katana as ck
from cost_katana import google
response = ck.ai(google.gemini_2_5_pro, "Hello")
Package names (ecosystem)
| Language | Registry | Install | Import / usage |
|---|---|---|---|
| Python | PyPI cost-katana |
pip install cost-katana |
import cost_katana |
| JavaScript | npm | npm install cost-katana |
import { ai } from 'cost-katana' |
| CLI (npm) | npm | npm install -g cost-katana-cli |
cost-katana chat |
| CLI (Python) | PyPI | pip install cost-katana |
costkatana (console script) |
More examples
github.com/Hypothesize-Tech/costkatana-examples — 45+ examples.
| Section | Description |
|---|---|
| Gateway (HTTP) | Proxy routing, caching, retries |
| Python SDK | Python-focused guides |
| Cost tracking | Cross-provider usage |
| Semantic caching | Cache savings |
| Frameworks | FastAPI and others |
Support
| Channel | Link |
|---|---|
| Dashboard | costkatana.com |
| Documentation | docs.costkatana.com |
| GitHub | github.com/Hypothesize-Tech/costkatana-python |
| Discord | discord.gg/D8nDArmKbY |
| support@costkatana.com |
License
MIT © Cost Katana
Start cutting AI costs today
pip install cost-katana
import cost_katana as ck
from cost_katana import openai
response = ck.ai(openai.gpt_4o, "Hello, world!")
print(response.text, response.cost)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cost_katana-2.5.5.tar.gz.
File metadata
- Download URL: cost_katana-2.5.5.tar.gz
- Upload date:
- Size: 49.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f634be28c1763c9bae2b7a97b208f9b4372a20deae23bf778ac47474ff62aa12
|
|
| MD5 |
ca819eccdfd9c9f16faa2d88b060588b
|
|
| BLAKE2b-256 |
4b1caab70dd4e1c437567c65a9bd6481fa8b0c7a6c822cf5712d9b3f3935d3a8
|
File details
Details for the file cost_katana-2.5.5-py3-none-any.whl.
File metadata
- Download URL: cost_katana-2.5.5-py3-none-any.whl
- Upload date:
- Size: 41.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ae7eddbffbab8e62613ad268b3cf462919b69fb9b903007b656637c467378436
|
|
| MD5 |
d2fd0df09f6dbc3ac7e71b8af89eb9f6
|
|
| BLAKE2b-256 |
10f28029c094eb4f4e805603f8bf3b90fe90372620291d92599bfa3fe9d12247
|