Reduce LLM costs by 90% - AI recommendations with NO API keys needed!
Project description
LLMOptimize
Cut your AI API costs — automatically. One import. Zero config. No API key required.
pip install llmoptimize
What It Does
LLMOptimize watches every AI API call your code makes and tells you which cheaper model to switch to — and why, in plain English.
- No API key needed to get recommendations
- Never touches your prompts or responses — read-only
- Works with OpenAI, Anthropic, Groq, Gemini, Mistral, Cohere
- Zero setup — just import it
Quickstart (2 lines)
import llmoptimize # ← add this at the top
import openai
client = openai.OpenAI()
response = client.chat.completions.create(
model = "gpt-4",
messages = [{"role": "user", "content": "Summarize this article..."}],
)
print(response.choices[0].message.content) # your real output, unchanged
llmoptimize.report() # ← add this at the end
That's it. Run your code normally. At the end you'll see:
╔══════════════════════════════════════════════════════════════╗
║ ║
║ 🚀 L L M O P T I M I Z E R E P O R T 🚀 ║
║ ║
║ Your AI Cost Optimization Summary ║
╚══════════════════════════════════════════════════════════════╝
📊 YOUR USAGE SUMMARY
🚀 Total API Calls Tracked
1
Optimized and analyzed
💰 Total Cost
$0.0036
Amount spent on AI API calls
💎 Potential Savings
$0.0034
That's 94% less than you could have spent!
────────────────────────────────────────────────────────────
📋 USAGE BY TYPE
💬 Chat 1 call $0.0036 → gpt-4o-mini (saves 94%)
────────────────────────────────────────────────────────────
💡 PERSONALIZED RECOMMENDATIONS
╭────────────────────────────────────────────────────────────╮
│ #1 Recommendation │
├────────────────────────────────────────────────────────────┤
│ 🎯 Switch to: gpt-4o-mini │
│ 💰 Save 94% on every call │
│ │
│ 💡 Why switch? │
│ You called gpt-4 1 time this session │
│ gpt-4o-mini costs 94% less — saves ~$0.18 per 1,000 │
│ calls at this token size │
│ │
│ ⚡ How to fix: │
│ Change model="gpt-4" → "gpt-4o-mini" │
╰────────────────────────────────────────────────────────────╯
Don't Have an API Key Yet? Use Dry-Run
Test your code flow and get cost advice before spending a dollar.
Wrap your code with with llmoptimize.report: — it intercepts calls,
returns mock responses so your code runs fully, and shows the report on exit.
import llmoptimize
import openai
client = openai.OpenAI(api_key="anything") # not used in dry-run
with llmoptimize.report:
# No real API calls — mock responses returned automatically
client.embeddings.create(
model = "text-embedding-3-large",
input = ["RAG systems retrieve relevant documents."],
)
client.chat.completions.create(
model = "gpt-4",
messages = [{"role": "user", "content": "Summarize this."}],
)
# Report prints automatically when the block exits
When you're ready to go live, just remove the with llmoptimize.report: line — your code is already correct.
Track a Specific Task
Use llmoptimize.task() to get a separate report per pipeline stage.
Each block gets a clean slate, its own label, and optional dry-run mode.
import llmoptimize
import openai
client = openai.OpenAI()
# Track real costs per stage
with llmoptimize.task("rag-pipeline"):
chunks = client.embeddings.create(model="text-embedding-3-large", input=["..."])
summary = client.chat.completions.create(model="gpt-4", messages=[...])
# Plan costs before shipping — no real API calls
with llmoptimize.task("cost-planning", dry_run=True):
client.chat.completions.create(model="gpt-4", messages=[...])
Each block prints its own labelled report:
╔══════════════════════════════════════════════════════════════╗
║ ║
║ 🚀 L L M O P T I M I Z E R E P O R T 🚀 ║
║ ║
║ Task: rag-pipeline ║
╚══════════════════════════════════════════════════════════════╝
📌 Task: rag-pipeline
📋 USAGE BY TYPE
📚 Embedding 1 call $0.0000 → text-embedding-3-small (saves 80%)
💬 Chat 1 call $0.0005 → gpt-4o-mini (saves 99%)
🧠 Reasoning — not used this session
CLI — Audit a File Before Running It
No code changes needed. Point it at any Python file and get instant advice:
llmoptimize audit mycode.py
╔════════════════════════════════════════════════════════════════╗
║ 🤖 AI CODE AUDIT REPORT ║
╚════════════════════════════════════════════════════════════════╝
📄 File: mycode.py
📊 SUMMARY
API calls found: 7
Issues detected: 4
Models used: gpt-4, claude-3-opus
Est. monthly cost: $342 (at 1,000 runs/month)
Potential savings: $298 (87%)
🔍 RECOMMENDATIONS
🔴 Line 42 — claude-3-opus
Switch to: claude-3-5-haiku | saves 95%
Why: You're using claude-3-opus ($90/1M tokens). For ticket
classification claude-3-5-haiku costs $4.80/1M — same accuracy,
18x cheaper.
Options:
llmoptimize audit mycode.py # full report
llmoptimize audit mycode.py --quiet # one-line summary
llmoptimize audit mycode.py --force # skip cache, always re-analyze
llmoptimize stats # show cache statistics
llmoptimize clear-cache # clear cached results
Supported Providers
import llmoptimize automatically patches every AI library you have installed. Nothing else needed.
| Provider | Library | Chat | Embeddings |
|---|---|---|---|
| OpenAI | openai |
✅ | ✅ |
| Anthropic | anthropic |
✅ | — |
| Groq | groq |
✅ | — |
| Google Gemini | google-generativeai |
✅ | — |
| Mistral | mistralai |
✅ | — |
| Cohere | cohere |
✅ | ✅ |
Pricing data for 60+ models including OpenAI, Anthropic, Groq, Gemini, Mistral, Cohere, Voyage AI, Jina AI, and AWS Bedrock.
How Recommendations Work
Recommendations are never just the cheapest model. The engine checks capability tiers so you only see alternatives that deliver comparable results:
| Tier | Examples |
|---|---|
| Frontier | gpt-4, claude-3-opus, o1 |
| Strong | gpt-4o, claude-3-5-sonnet, gemini-1.5-pro |
| Capable | gpt-4o-mini, claude-3-haiku, gemini-1.5-flash |
| Lightweight | gemini-1.5-flash-8b, llama-3.1-8b-instant |
It only recommends models at most one tier below what you're using — never a dramatic quality drop.
How reasoning is generated:
- Pricing tables identify which model to switch to and savings %
- AI analysis (Groq, on our server — no key needed from you) explains why in plain English
- If the server is unreachable, cached reasoning from previous sessions is used
- Final fallback: friendly plain-English text computed from pricing data
Free Tier & License
LLMOptimize includes 500 free tracked calls per machine.
🎉 Upgrade to continue:
llmoptimize activate YOUR_LICENSE_KEY
Activate a paid license
llmoptimize activate llmopt-xxxxxxxxxxxx
# ✅ License activated! Plan: starter | 500 calls/month
# Valid through: 2026-04
The key is validated online and stored locally at ~/.aioptimize/license.json.
No environment variables needed. Works for all future sessions on this machine.
Remove a license
llmoptimize deactivate
# Remove license llmopt-xxxx...? (y/N): y
# ✅ License removed. Free tier limits restored.
For servers / containers
export AIOPTIMIZE_LICENSE_KEY="llmopt-xxxxxxxxxxxx"
Manual Tracking
For custom or self-hosted models not auto-patched:
llmoptimize.track(
model = "my-custom-model",
prompt_tokens = 400,
completion_tokens = 120,
provider = "custom",
)
llmoptimize.report()
Session Management
llmoptimize.new_session() # clear tracking, start fresh
llmoptimize.report(interactive=False) # no menu prompt — useful in scripts
Privacy
| Data | Stored locally | Sent to server |
|---|---|---|
| Your prompt text | Never | Never |
| Token counts | Yes | Yes |
| Model names | Yes | Yes |
| Cost figures | Yes | Yes |
| API keys | Never stored | Never sent |
Prompt text never leaves your machine. To disable server tracking entirely:
export AIOPTIMIZE_SERVER_URL=""
FAQ
Do I need to configure anything?
No. import llmoptimize is all the setup required.
Will it slow down my app? No. Tracking happens after your response is returned and never blocks the critical path.
What if the recommendation server is unreachable? It falls back to local pricing data instantly. Your app is never affected.
Does it work with LangChain / LlamaIndex? Yes — both use the underlying OpenAI/Anthropic SDKs which are patched automatically.
Does it work with streaming? Yes. Token counts are recorded from the final usage block after streaming completes.
Can I use it without an API key at all?
Yes — use dry_run=True or with llmoptimize.report:. Your code runs end-to-end with mock responses. No API key, no cost, full recommendations.
What's the difference between task() and report()?
task("name") resets the session first (clean slate) and labels the output. report() shows everything tracked since the last reset. Use task() when benchmarking specific pipeline stages.
LLMOptimize v3.2.4 — spend less, build more.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llmoptimize-3.2.4.tar.gz.
File metadata
- Download URL: llmoptimize-3.2.4.tar.gz
- Upload date:
- Size: 25.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b1a181355e097965ce56ef95a589bda371de010e9abe9791b270e514cc08a70b
|
|
| MD5 |
4e600b5f20eef4ff75987722b3dad418
|
|
| BLAKE2b-256 |
480db97f39164aec83bfe771e799b7146d26bf3285b044fff4d5ce06cec8a127
|
File details
Details for the file llmoptimize-3.2.4-py3-none-any.whl.
File metadata
- Download URL: llmoptimize-3.2.4-py3-none-any.whl
- Upload date:
- Size: 24.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
efb3efee926ec50b6934c00f574ab979fb17852db6485bd3dc84bd5debb596b1
|
|
| MD5 |
c900eb52f72176e0dea8e188e4878d73
|
|
| BLAKE2b-256 |
3e25b93441fa4113100ab8769bbe2b00cec74f9c8db0d24e8684f1b0e2d81727
|