The simplest way to use AI in Python with automatic cost tracking and optimization

These details have not been verified by PyPI

Project links

Project description

Cost Katana Python

AI that just works. Costs that just track.

One import. Any model. Automatic cost tracking.

Installation
Quick start
AI gateway (HTTP)
Core APIs
Type-safe model namespaces
Configuration
Cost optimization
Core features
Framework integration
Real-world examples
Error handling
Experimentation (hosted API)
Migration guides
Package names (ecosystem)
More examples
Support
License

Installation

The package on PyPI is named cost-katana (hyphen). You import it as cost_katana (underscore).

pip install cost-katana

Requires Python 3.8+.

Quick start

1. Environment

export COST_KATANA_API_KEY="dak_your_key_here"   # required — from the dashboard
export PROJECT_ID="your_project_id"               # optional — per-project dashboard filtering

The default API base URL is https://api.costkatana.com (not overridden by env in the high-level client).

2. First call

import cost_katana as ck
from cost_katana import openai

response = ck.ai(openai.gpt_4o, "Explain quantum computing in one sentence")

print(response.text)
print(response.cost)
print(response.tokens)

With COST_KATANA_API_KEY set, you do not need to call configure() first — ck.ai() / ck.chat() auto-configure from the environment. Usage and cost attribution are always on.

If you only set COST_KATANA_API_KEY (no direct provider keys such as OPENAI_API_KEY), requests use Cost Katana hosted models through the backend.

Which surface should I use?

Goal	Use
Simple Python calls with cost on every response	`ck.ai()` / `ck.chat()`
Drop-in HTTP proxy (OpenAI- or Anthropic-shaped JSON)	Gateway — `httpx` / `requests` or cURL (below)
Dashboard experiments (compare models, what-if)	Experimentation API (REST; often used via the web app)

AI gateway (HTTP)

The high-level ck.ai() / ck.chat() APIs call Cost Katana’s backend. For the same drop-in HTTP proxy as the TypeScript gateway() helper, call the gateway with httpx or requests.

Base URL (default): https://api.costkatana.com/api/gateway
Override with COSTKATANA_GATEWAY_URL if your deployment documents one.

Headers

Header	Value
`Authorization`	`Bearer <COST_KATANA_API_KEY>`
`Content-Type`	`application/json`
`x-project-id`	Optional — same role as `PROJECT_ID` for dashboard scoping

The hosted gateway enables input firewall (LLM security) and output moderation by default. To opt out for a request, add CostKatana-LLM-Security-Enabled: false and/or CostKatana-Output-Moderation-Enabled: false. You can merge cost_katana.gateway_request_headers(llm_security_enabled=False) into your headers.

Dashboard aggregates for blocked prompts and moderation: CostKatanaClient.get_gateway_security_summary() (GET /api/gateway/security/summary).

OpenAI-compatible — POST {GATEWAY}/v1/chat/completions

import os
import httpx

GATEWAY = os.environ.get(
    "COSTKATANA_GATEWAY_URL",
    "https://api.costkatana.com/api/gateway",
).rstrip("/")

headers = {
    "Authorization": f"Bearer {os.environ['COST_KATANA_API_KEY']}",
    "Content-Type": "application/json",
}
project_id = os.environ.get("PROJECT_ID")
if project_id:
    headers["x-project-id"] = project_id

payload = {
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello!"}],
}

with httpx.Client(timeout=60.0) as client:
    r = client.post(f"{GATEWAY}/v1/chat/completions", headers=headers, json=payload)
    r.raise_for_status()
    data = r.json()
    print(data["choices"][0]["message"]["content"])

Anthropic Messages — POST {GATEWAY}/v1/messages

import os
import httpx

GATEWAY = os.environ.get(
    "COSTKATANA_GATEWAY_URL",
    "https://api.costkatana.com/api/gateway",
).rstrip("/")

headers = {
    "Authorization": f"Bearer {os.environ['COST_KATANA_API_KEY']}",
    "Content-Type": "application/json",
}
if pid := os.environ.get("PROJECT_ID"):
    headers["x-project-id"] = pid

payload = {
    "model": "claude-3-5-sonnet-20241022",
    "max_tokens": 256,
    "messages": [{"role": "user", "content": "Hello!"}],
}

with httpx.Client(timeout=60.0) as client:
    r = client.post(f"{GATEWAY}/v1/messages", headers=headers, json=payload)
    r.raise_for_status()
    data = r.json()
    for block in data.get("content", []):
        if block.get("type") == "text":
            print(block["text"])

cURL

curl -sS "https://api.costkatana.com/api/gateway/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $COST_KATANA_API_KEY" \
  -d '{"model":"gpt-4o","messages":[{"role":"user","content":"Hello!"}]}'

More patterns (caching, retries, headers): costkatana-examples 2-gateway and cost-katana-core examples/GATEWAY_USAGE_AND_TRACKING.md.

Core APIs

`ck.ai()`

import cost_katana as ck
from cost_katana import openai, anthropic, google

response = ck.ai(openai.gpt_4o, "Your prompt", temperature=0.7, max_tokens=500, cache=True, cortex=True)

Prefer namespace constants (openai.gpt_4o, anthropic.claude_3_5_sonnet_20241022, …) over raw strings so IDs stay correct as models change.

`ck.chat()`

import cost_katana as ck
from cost_katana import openai

chat = ck.chat(openai.gpt_4o, system_message="You are a helpful assistant.")

chat.send("Hello! What can you help me with?")
chat.send("Tell me a programming joke")

print(f"Total cost: ${chat.total_cost:.4f}")
print(f"Messages: {len(chat.history)}")
print(f"Tokens: {chat.total_tokens}")

Caching

import cost_katana as ck
from cost_katana import openai

r1 = ck.ai(openai.gpt_4o, "What is 2+2?", cache=True)
r2 = ck.ai(openai.gpt_4o, "What is 2+2?", cache=True)
print(r1.cached, r2.cached)

Cortex (optimization)

import cost_katana as ck
from cost_katana import openai

response = ck.ai(
    openai.gpt_4o,
    "Write a comprehensive guide to machine learning for beginners",
    cortex=True,
    max_tokens=2000,
)
print(response.optimized, response.saved_amount)

Claude extended thinking

For Claude models that support extended thinking (e.g. Opus 4.x, Sonnet 4.x, Sonnet 3.7), pass thinking=True. The gateway sizes the reasoning budget; thinking tokens count as output tokens for billing. On Opus 4.6 / 4.7 and Sonnet 4.6, you can set thinking_effort to 'low', 'medium', 'high', or 'max'. thinking_budget_tokens is optional if you need a fixed cap.

import cost_katana as ck
from cost_katana import anthropic

response = ck.ai(
    anthropic.claude_sonnet_4_20250514,
    "Solve step by step: If a train leaves at 3pm...",
    thinking=True,
    max_tokens=1024,
)
print(response.text)
if response.thinking:
    print("Reasoning:", response.thinking)

# Adaptive effort (Sonnet 4.6 / Opus 4.6+)
r2 = ck.ai(
    anthropic.claude_sonnet_4_6,
    "Plan a week-long API migration with risks and rollback.",
    thinking=True,
    thinking_effort="high",
    max_tokens=2048,
)

With CostKatanaClient directly:

from cost_katana import from_env
from cost_katana import anthropic

client = from_env()
data = client.send_message(
    message="Explain this architecture tradeoff.",
    model_id=anthropic.claude_sonnet_4_20250514,
    thinking=True,
    thinking_effort="medium",
)
print(data.get("data", {}).get("response"))
print(data.get("data", {}).get("thinking"))

Compare models

import cost_katana as ck
from cost_katana import openai, anthropic, google

prompt = "Summarize the theory of relativity in 50 words"
models = [
    ("GPT-4 class", openai.gpt_4o),
    ("Claude 3.5 Sonnet", anthropic.claude_3_5_sonnet_20241022),
    ("Gemini 2.5 Pro", google.gemini_2_5_pro),
    ("GPT-3.5 Turbo", openai.gpt_3_5_turbo),
]

for name, model in models:
    r = ck.ai(model, prompt)
    print(f"{name:22} ${r.cost:.6f}")

Type-safe model namespaces

from cost_katana import openai, anthropic, google, aws_bedrock, xai, deepseek, mistral, groq, cohere, meta

Namespace	Examples
`openai`	GPT-4, GPT-3.5, O1, O3, DALL-E, Whisper
`anthropic`	Claude 3.5 Sonnet, Haiku, Opus
`google`	Gemini 2.5 Pro, Flash
`aws_bedrock`	Nova, Claude on Bedrock
`xai`	Grok
`deepseek`	DeepSeek
`mistral`	Mistral
`groq`	Groq-hosted Llama / Mixtral / Gemma
`cohere`	Command
`meta`	Llama

Configuration

Environment variables

Variable	Required?	Purpose
`COST_KATANA_API_KEY`	Yes	Dashboard API key (`dak_...`)
`PROJECT_ID`	No	Per-project scope; aliases include `COST_KATANA_PROJECT` / `COSTKATANA_PROJECT_ID`

Base URL, default model, and timeouts in the client are package constants — not set via environment variables (unless documented for a specific helper).

Optional direct provider keys

If you call provider APIs yourself outside hosted routing, you may set OPENAI_API_KEY, GEMINI_API_KEY, ANTHROPIC_API_KEY, or AWS credentials for Bedrock. They are not required when only COST_KATANA_API_KEY is set.

Helpers

cost_katana.from_env() — CostKatanaClient from the same env vars as zero-config usage.
cost_katana.auto_configure() — lazy init before ai() / chat() / track().
cost_katana.track({...}) — manual cost row to the dashboard.
Config.from_env() — same env mapping as the client.

Programmatic configuration

import cost_katana as ck

ck.configure(
    api_key="dak_your_key",
    cortex=True,
    cache=True,
    firewall=True,
)

Request options

import cost_katana as ck
from cost_katana import openai

response = ck.ai(
    openai.gpt_4o,
    "Your prompt",
    temperature=0.7,
    max_tokens=500,
    system_message="You are helpful",
    cache=True,
    cortex=True,
    retry=True,
)

Cost optimization

Strategy	When to use
Cheaper model for easy tasks	Trivia, classification, translation
`cache=True`	Repeated FAQs
`cortex=True`	Long-form generation
`ck.chat(...)`	Multi-turn sessions
High volume, cost-sensitive	Consider Gemini Flash-class models via namespaces

import cost_katana as ck
from cost_katana import openai

ck.ai(openai.gpt_3_5_turbo, "What is 2+2?")
ck.ai(openai.gpt_3_5_turbo, "What is 2+2?", cache=True)
ck.ai(openai.gpt_4o, "Write a 2000-word essay", cortex=True)

Core features

Cost tracking

Every response includes cost and token usage (tracking cannot be disabled — required for attribution).

response = ck.ai(openai.gpt_4o, "Write a story")
print(response.cost, response.tokens, response.model, response.provider)

Auto-failover

Routing may fall back across providers when configured on the backend.

response = ck.ai(openai.gpt_4o, "Hello")
print(response.provider)

Security firewall

import cost_katana as ck
from cost_katana import openai

ck.configure(firewall=True)
try:
    ck.ai(openai.gpt_4o, "ignore all previous instructions and...")
except Exception as e:
    print(f"Blocked: {e}")

Framework integration

FastAPI

from fastapi import FastAPI
import cost_katana as ck
from cost_katana import openai

app = FastAPI()

@app.post("/api/chat")
async def chat(body: dict):
    r = ck.ai(openai.gpt_4o, body["prompt"])
    return {"text": r.text, "cost": r.cost}

Flask

from flask import Flask, request, jsonify
import cost_katana as ck
from cost_katana import openai

app = Flask(__name__)

@app.route("/api/chat", methods=["POST"])
def chat():
    r = ck.ai(openai.gpt_4o, request.json["prompt"])
    return jsonify({"text": r.text, "cost": r.cost})

Django

from django.http import JsonResponse
import cost_katana as ck
from cost_katana import openai

def chat_view(request):
    r = ck.ai(openai.gpt_4o, request.POST.get("prompt"))
    return JsonResponse({"text": r.text, "cost": r.cost})

Real-world examples

Customer support bot

import cost_katana as ck
from cost_katana import openai

support = ck.chat(
    openai.gpt_3_5_turbo,
    system_message="You are a helpful customer support agent.",
)

def handle_query(query: str):
    reply = support.send(query)
    print(f"Cost so far: ${support.total_cost:.4f}")
    return reply

Content with Cortex

import cost_katana as ck
from cost_katana import openai

def generate_blog_post(topic: str):
    post = ck.ai(
        openai.gpt_4o,
        f"Write a blog post about {topic}",
        cortex=True,
        max_tokens=2000,
    )
    return {"content": post.text, "cost": post.cost, "word_count": len(post.text.split())}

Code review (with cache)

import cost_katana as ck
from cost_katana import anthropic

def review_code(code: str):
    return ck.ai(
        anthropic.claude_3_5_sonnet_20241022,
        f"Review this code:\n\n{code}",
        cache=True,
    ).text

Error handling

import cost_katana as ck
from cost_katana import openai
from cost_katana.exceptions import CostKatanaError

try:
    response = ck.ai(openai.gpt_4o, "Hello")
    print(response.text)
except CostKatanaError as e:
    msg = str(e).lower()
    if "api key" in msg:
        print("Set COST_KATANA_API_KEY or a provider key")
    elif "rate limit" in msg:
        print("Rate limited — retry with backoff")
    elif "model" in msg:
        print("Model not found or not available")
    else:
        print(f"Error: {e}")

Experimentation (hosted API)

The Cost Katana backend exposes experimentation REST APIs under /api/experimentation on the hosted API (e.g. https://api.costkatana.com). The dashboard Experimentation experience is built on these endpoints; you can also call them with a dashboard JWT where required.

Highlights

Model comparison, real-time comparison with SSE progress, experiment history, recommendations
What-if scenarios and simulations
Cost estimation before runs
Fine-tuning analysis helpers
Export experiment results (JSON/CSV)

Public vs authenticated routes depend on deployment; see the backend controller: experimentation.controller.ts. Real provider execution may require server flags such as ENABLE_REAL_MODEL_COMPARISON=true.

For a longer overview, see cost-katana-core README.md — Experimentation.

Migration guides

From OpenAI SDK

# Before
from openai import OpenAI
client = OpenAI(api_key="sk-...")
completion = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello"}],
)
print(completion.choices[0].message.content)

# After
import cost_katana as ck
from cost_katana import openai
response = ck.ai(openai.gpt_4, "Hello")
print(response.text)
print(f"Cost: ${response.cost}")

From Anthropic SDK

# Before
import anthropic
client = anthropic.Anthropic(api_key="sk-ant-...")
message = client.messages.create(
    model="claude-3-sonnet-20241022",
    messages=[{"role": "user", "content": "Hello"}],
)

# After
import cost_katana as ck
from cost_katana import anthropic
response = ck.ai(anthropic.claude_3_5_sonnet_20241022, "Hello")

From Google AI SDK

# Before
import google.generativeai as genai
genai.configure(api_key="...")
model = genai.GenerativeModel("gemini-pro")
response = model.generate_content("Hello")

# After
import cost_katana as ck
from cost_katana import google
response = ck.ai(google.gemini_2_5_pro, "Hello")

Package names (ecosystem)

Language	Registry	Install	Import / usage
Python	PyPI `cost-katana`	`pip install cost-katana`	`import cost_katana`
JavaScript	npm	`npm install cost-katana`	`import { ai } from 'cost-katana'`
CLI (npm)	npm	`npm install -g cost-katana-cli`	`cost-katana chat`
CLI (Python)	PyPI	`pip install cost-katana`	`costkatana` (console script)

More examples

github.com/Hypothesize-Tech/costkatana-examples — 45+ examples.

Section	Description
Gateway (HTTP)	Proxy routing, caching, retries
Python SDK	Python-focused guides
Cost tracking	Cross-provider usage
Semantic caching	Cache savings
Frameworks	FastAPI and others

Support

Channel	Link
Dashboard	costkatana.com
Documentation	docs.costkatana.com
GitHub	github.com/Hypothesize-Tech/costkatana-python
Discord	discord.gg/D8nDArmKbY
Email	support@costkatana.com

License

MIT © Cost Katana

Start cutting AI costs today

pip install cost-katana

import cost_katana as ck
from cost_katana import openai

response = ck.ai(openai.gpt_4o, "Hello, world!")
print(response.text, response.cost)

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

2.5.7

Apr 29, 2026

This version

2.5.5

Apr 29, 2026

2.5.4

Apr 21, 2026

2.5.3

Mar 29, 2026

2.5.2

Mar 27, 2026

2.5.1

Mar 25, 2026

2.5.0

Mar 25, 2026

2.4.0

Feb 21, 2026

2.2.6

Jan 31, 2026

2.2.5

Jan 17, 2026

2.2.4

Dec 2, 2025

2.2.3

Nov 26, 2025

2.2.2

Nov 24, 2025

2.2.1

Nov 19, 2025

2.2.0

Nov 19, 2025

2.1.0

Nov 17, 2025

2.0.8

Nov 15, 2025

2.0.7

Oct 23, 2025

2.0.6

Oct 23, 2025

2.0.5

Oct 23, 2025

2.0.4

Oct 12, 2025

2.0.2

Oct 9, 2025

2.0.1

Oct 2, 2025

2.0.0

Sep 11, 2025

1.0.3

Sep 8, 2025

1.0.2

Sep 8, 2025

1.0.1

Aug 4, 2025

1.0.0

Aug 4, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cost_katana-2.5.5.tar.gz (49.1 kB view details)

Uploaded Apr 29, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cost_katana-2.5.5-py3-none-any.whl (41.1 kB view details)

Uploaded Apr 29, 2026 Python 3

File details

Details for the file cost_katana-2.5.5.tar.gz.

File metadata

Download URL: cost_katana-2.5.5.tar.gz
Upload date: Apr 29, 2026
Size: 49.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for cost_katana-2.5.5.tar.gz
Algorithm	Hash digest
SHA256	`f634be28c1763c9bae2b7a97b208f9b4372a20deae23bf778ac47474ff62aa12`
MD5	`ca819eccdfd9c9f16faa2d88b060588b`
BLAKE2b-256	`4b1caab70dd4e1c437567c65a9bd6481fa8b0c7a6c822cf5712d9b3f3935d3a8`

See more details on using hashes here.

File details

Details for the file cost_katana-2.5.5-py3-none-any.whl.

File metadata

Download URL: cost_katana-2.5.5-py3-none-any.whl
Upload date: Apr 29, 2026
Size: 41.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for cost_katana-2.5.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ae7eddbffbab8e62613ad268b3cf462919b69fb9b903007b656637c467378436`
MD5	`d2fd0df09f6dbc3ac7e71b8af89eb9f6`
BLAKE2b-256	`10f28029c094eb4f4e805603f8bf3b90fe90372620291d92599bfa3fe9d12247`

See more details on using hashes here.

cost-katana 2.5.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Cost Katana Python

Table of contents

Installation

Quick start

1. Environment

2. First call

Which surface should I use?

AI gateway (HTTP)

Core APIs

ck.ai()

ck.chat()

Caching

Cortex (optimization)

Claude extended thinking

Compare models

Type-safe model namespaces

Configuration

Environment variables

Optional direct provider keys

Helpers

Programmatic configuration

Request options

Cost optimization

Core features

Cost tracking

Auto-failover

Security firewall

Framework integration

FastAPI

Flask

Django

Real-world examples

Customer support bot

Content with Cortex

Code review (with cache)

Error handling

Experimentation (hosted API)

Migration guides

From OpenAI SDK

From Anthropic SDK

From Google AI SDK

Package names (ecosystem)

More examples

Support

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`ck.ai()`

`ck.chat()`