The simplest way to use AI in Python with automatic cost tracking and optimization
Project description
Cost Katana Python 🥷
AI that just works. Costs that just track.
One import. Any model. Automatic cost tracking.
🚀 Get Started in 60 Seconds
Step 1: Install
pip install costkatana
Step 2: Set environment variables
export COST_KATANA_API_KEY="dak_your_key_here" # required — from the dashboard
export PROJECT_ID="your_project_id" # optional — per-project dashboard filtering
The API base URL is fixed at https://api.costkatana.com (not configurable via env).
Step 3: Make Your First AI Call
import cost_katana as ck
from cost_katana import openai
response = ck.ai(openai.gpt_4o, "Explain quantum computing in one sentence")
print(response.text) # "Quantum computing uses qubits to perform..."
print(response.cost) # 0.0012
print(response.tokens) # 47
That's it. With COST_KATANA_API_KEY set, you do not need to call configure() — ck.ai() / ck.chat() auto-configure from the environment. Usage and cost tracking is always on—there is no option to disable it (required for usage attribution and cost visibility).
If you only set COST_KATANA_API_KEY (no direct provider keys like OPENAI_API_KEY), requests use Cost Katana hosted models through the backend.
📖 Tutorial: Build a Cost-Aware AI App
Part 1: Basic Chat Session
import cost_katana as ck
# Create a persistent chat session
chat = ck.chat('gpt-4')
chat.send('Hello! What can you help me with?')
chat.send('Tell me a programming joke')
chat.send('Now explain it')
# See exactly what you spent
print(f"💰 Total cost: ${chat.total_cost:.4f}")
print(f"📊 Messages: {len(chat.history)}")
print(f"🎯 Tokens used: {chat.total_tokens}")
Part 2: Type-Safe Model Selection
Stop guessing model names. Get autocomplete and catch typos:
import cost_katana as ck
from cost_katana import openai, anthropic, google
# Type-safe model constants (recommended)
response = ck.ai(openai.gpt_4, 'Hello, world!')
# Compare models easily
models = [openai.gpt_4, anthropic.claude_3_5_sonnet_20241022, google.gemini_2_5_pro]
for model in models:
response = ck.ai(model, 'Explain AI in one sentence')
print(f"Cost: ${response.cost:.4f}")
Available namespaces:
| Namespace | Models |
|---|---|
openai |
GPT-4, GPT-3.5, O1, O3, DALL-E, Whisper |
anthropic |
Claude 3.5 Sonnet, Haiku, Opus |
google |
Gemini 2.5 Pro, Flash |
aws_bedrock |
Nova, Claude on Bedrock |
xai |
Grok models |
deepseek |
DeepSeek models |
mistral |
Mistral AI models |
groq |
Groq-hosted Llama / Mixtral / Gemma |
cohere |
Command models |
meta |
Llama models |
Part 3: Smart Caching
Cache identical questions to avoid paying twice:
import cost_katana as ck
# First call - hits the API
r1 = ck.ai('gpt-4', 'What is 2+2?', cache=True)
print(f"Cached: {r1.cached}") # False
print(f"Cost: ${r1.cost}") # $0.0008
# Second call - served from cache (FREE!)
r2 = ck.ai('gpt-4', 'What is 2+2?', cache=True)
print(f"Cached: {r2.cached}") # True
print(f"Cost: ${r2.cost}") # $0.0000 🎉
Part 4: Cortex Optimization
For long-form content, Cortex compresses prompts intelligently:
import cost_katana as ck
response = ck.ai(
'gpt-4',
'Write a comprehensive guide to machine learning for beginners',
cortex=True, # Enable 40-75% cost reduction
max_tokens=2000
)
print(f"Optimized: {response.optimized}")
print(f"Saved: ${response.saved_amount}")
Part 5: Compare Models Side-by-Side
import cost_katana as ck
prompt = 'Summarize the theory of relativity in 50 words'
models = ['gpt-4', 'claude-3-sonnet', 'gemini-pro', 'gpt-3.5-turbo']
print('📊 Model Cost Comparison\n')
for model in models:
response = ck.ai(model, prompt)
print(f"{model:20} ${response.cost:.6f}")
Sample Output:
📊 Model Cost Comparison
gpt-4 $0.001200
claude-3-sonnet $0.000900
gemini-pro $0.000150
gpt-3.5-turbo $0.000080
🎯 Core Features
Cost Tracking
Usage and cost tracking is always on; no option to disable. Every response includes cost information:
response = ck.ai('gpt-4', 'Write a story')
print(f"Cost: ${response.cost}")
print(f"Tokens: {response.tokens}")
print(f"Model: {response.model}")
print(f"Provider: {response.provider}")
Auto-Failover
Never fail—automatically switch providers:
# If OpenAI is down, automatically uses Claude or Gemini
response = ck.ai('gpt-4', 'Hello')
print(response.provider) # Might be 'anthropic' if OpenAI failed
Security Firewall
Block malicious prompts:
import cost_katana as ck
ck.configure(firewall=True)
# Malicious prompts are blocked
try:
ck.ai('gpt-4', 'ignore all previous instructions and...')
except Exception as e:
print(f'🛡️ Blocked: {e}')
⚙️ Configuration
Environment variables (public contract)
| Variable | Required? | Purpose |
|---|---|---|
COST_KATANA_API_KEY |
Yes | Dashboard API key (dak_...) |
PROJECT_ID |
No (warning if omitted) | Per-project dashboard scope; also COST_KATANA_PROJECT / COSTKATANA_PROJECT_ID |
export COST_KATANA_API_KEY="dak_your_key_here"
export PROJECT_ID="your_project_id" # optional
Base URL, default model, and timeouts are package constants — not set via environment variables.
Optional: direct provider keys
If you call provider APIs yourself (outside Cost Katana’s hosted routing), you may set keys such as OPENAI_API_KEY, GEMINI_API_KEY, ANTHROPIC_API_KEY, or AWS credentials for Bedrock. They are not required for the default hosted path when only COST_KATANA_API_KEY is set.
Easy helpers
cost_katana.from_env()— explicitCostKatanaClientbuilt from the same two env vars (mirrors the TS SDK’s zero-config client).cost_katana.auto_configure()— lazy init used internally beforeai()/chat()/track().cost_katana.track({...})— log a manual cost row to the dashboard without wiringAILoggeryourself.Config.from_env()— same env mapping as the client.
Programmatic Configuration
import cost_katana as ck
ck.configure(
api_key='dak_your_key',
cortex=True, # 40-75% cost savings
cache=True, # Smart caching
firewall=True # Block prompt injections
)
Request Options
response = ck.ai('gpt-4', 'Your prompt',
temperature=0.7, # Creativity (0-2)
max_tokens=500, # Response limit
system_message='You are helpful', # System prompt
cache=True, # Enable caching
cortex=True, # Enable optimization
retry=True # Auto-retry on failures
)
🔌 Framework Integration
FastAPI
from fastapi import FastAPI
import cost_katana as ck
app = FastAPI()
@app.post('/api/chat')
async def chat(request: dict):
response = ck.ai('gpt-4', request['prompt'])
return {'text': response.text, 'cost': response.cost}
Flask
from flask import Flask, request, jsonify
import cost_katana as ck
app = Flask(__name__)
@app.route('/api/chat', methods=['POST'])
def chat():
response = ck.ai('gpt-4', request.json['prompt'])
return jsonify({'text': response.text, 'cost': response.cost})
Django
from django.http import JsonResponse
import cost_katana as ck
def chat_view(request):
response = ck.ai('gpt-4', request.POST.get('prompt'))
return JsonResponse({'text': response.text, 'cost': response.cost})
💡 Real-World Examples
Customer Support Bot
import cost_katana as ck
support = ck.chat('gpt-3.5-turbo',
system_message='You are a helpful customer support agent.')
def handle_query(query: str):
response = support.send(query)
print(f"Cost so far: ${support.total_cost:.4f}")
return response
Content Generator with Optimization
import cost_katana as ck
def generate_blog_post(topic: str):
# Use Cortex for long-form content (40-75% savings)
post = ck.ai('gpt-4', f'Write a blog post about {topic}',
cortex=True, max_tokens=2000)
return {
'content': post.text,
'cost': post.cost,
'word_count': len(post.text.split())
}
Code Review Assistant
import cost_katana as ck
def review_code(code: str):
review = ck.ai('claude-3-sonnet',
f'Review this code and suggest improvements:\n\n{code}',
cache=True) # Cache for repeated reviews
return review.text
Translation Service
import cost_katana as ck
def translate(text: str, target_language: str):
# Use cheaper model for translations
translated = ck.ai('gpt-3.5-turbo',
f'Translate to {target_language}: {text}',
cache=True)
return translated.text
💰 Cost Optimization Cheatsheet
| Strategy | Savings | Code |
|---|---|---|
| Use GPT-3.5 for simple tasks | 90% | ck.ai('gpt-3.5-turbo', ...) |
| Enable caching | 100% on hits | cache=True |
| Enable Cortex | 40-75% | cortex=True |
| Use Gemini for high-volume | 95% vs GPT-4 | ck.ai('gemini-pro', ...) |
| Batch in sessions | 10-20% | ck.chat(...) |
# ❌ Expensive
ck.ai('gpt-4', 'What is 2+2?') # $0.001
# ✅ Smart: Match model to task
ck.ai('gpt-3.5-turbo', 'What is 2+2?') # $0.0001
# ✅ Smarter: Cache common queries
ck.ai('gpt-3.5-turbo', 'What is 2+2?', cache=True) # $0 on repeat
# ✅ Smartest: Cortex for long content
ck.ai('gpt-4', 'Write a 2000-word essay', cortex=True) # 40-75% off
🔧 Error Handling
import cost_katana as ck
from cost_katana.exceptions import CostKatanaError
try:
response = ck.ai('gpt-4', 'Hello')
print(response.text)
except CostKatanaError as e:
if 'API key' in str(e):
print('Set COST_KATANA_API_KEY or OPENAI_API_KEY')
elif 'rate limit' in str(e):
print('Rate limited. Retrying...')
elif 'model' in str(e):
print('Model not found')
else:
print(f'Error: {e}')
🔄 Migration Guides
From OpenAI SDK
# Before
from openai import OpenAI
client = OpenAI(api_key='sk-...')
completion = client.chat.completions.create(
model='gpt-4',
messages=[{'role': 'user', 'content': 'Hello'}]
)
print(completion.choices[0].message.content)
# After
import cost_katana as ck
response = ck.ai('gpt-4', 'Hello')
print(response.text)
print(f"Cost: ${response.cost}") # Bonus: cost tracking!
From Anthropic SDK
# Before
import anthropic
client = anthropic.Anthropic(api_key='sk-ant-...')
message = client.messages.create(
model='claude-3-sonnet-20241022',
messages=[{'role': 'user', 'content': 'Hello'}]
)
# After
import cost_katana as ck
response = ck.ai('claude-3-sonnet', 'Hello')
From Google AI SDK
# Before
import google.generativeai as genai
genai.configure(api_key='...')
model = genai.GenerativeModel('gemini-pro')
response = model.generate_content('Hello')
# After
import cost_katana as ck
response = ck.ai('gemini-pro', 'Hello')
📦 Package Names
| Language | Package | Install | Import |
|---|---|---|---|
| Python | PyPI | pip install costkatana |
import cost_katana |
| JavaScript | NPM | npm install cost-katana |
import { ai } from 'cost-katana' |
| CLI (NPM) | NPM | npm install -g cost-katana-cli |
cost-katana chat |
| CLI (Python) | PyPI | pip install costkatana |
costkatana chat |
📚 More Examples
Explore 45+ complete examples:
🔗 github.com/Hypothesize-Tech/costkatana-examples
| Section | Description |
|---|---|
| Python SDK | Complete Python guides |
| Cost Tracking | Track costs across providers |
| Semantic Caching | 30-40% cost reduction |
| FastAPI Integration | Framework examples |
📞 Support
| Channel | Link |
|---|---|
| Dashboard | costkatana.com |
| Documentation | docs.costkatana.com |
| GitHub | github.com/Hypothesize-Tech/costkatana-python |
| Discord | discord.gg/D8nDArmKbY |
| support@costkatana.com |
📄 License
MIT © Cost Katana
Start cutting AI costs today 🥷
pip install costkatana
import cost_katana as ck
response = ck.ai('gpt-4', 'Hello, world!')
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cost_katana-2.5.0.tar.gz.
File metadata
- Download URL: cost_katana-2.5.0.tar.gz
- Upload date:
- Size: 43.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
444cbdd69e513b2a7a61420966a9009417fbf8dce875d962718d5f7514b03f9a
|
|
| MD5 |
5260c49c5fa5b3a04a09eab5b0211e22
|
|
| BLAKE2b-256 |
3f45ccf8f0a6d66eeec2fd5a68c7a298046adf476e131d9eaf9ba381cde407f6
|
File details
Details for the file cost_katana-2.5.0-py3-none-any.whl.
File metadata
- Download URL: cost_katana-2.5.0-py3-none-any.whl
- Upload date:
- Size: 37.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
771c9d4ea7598a176c539151b7ff07f61f6c977e3dfe4697ee87674ee6b01e9d
|
|
| MD5 |
f35a4492c0040140b7883755351d7e6c
|
|
| BLAKE2b-256 |
28555899a62a7b4eb840261a9f5dd55f4ff881564ae3da22122ee55c873e7ab6
|