Reduce LLM costs by 90% - AI recommendations with NO API keys needed!
Project description
LLMOptimize
Cut your AI API costs by 40–97% — automatically. One import. Zero prompt changes. No infrastructure to run.
pip install llmoptimize
import llmoptimize # done — every AI call is now tracked and optimised
Table of Contents
- What It Does
- Quick Start
- Installation
- Auto-Tracking
- The
@track_costDecorator - Configuration
- Guardrails
- CLI Audit Tool
- Dashboard & Reports
- Supported Providers & Models
- Privacy
- Plans & Pricing
- FAQ
What It Does
LLMOptimize monitors every AI API call your application makes and tells you when a cheaper model would do the same job just as well.
Your Code → LLMOptimize SDK → Your AI Provider (OpenAI / Anthropic / ...)
│
▼
Recommendation Engine
(hosted — nothing to run)
│
▼
"Use gpt-3.5-turbo instead.
95% cheaper. Minimal quality
impact. 90% confident."
The recommendation engine has three layers that run in order:
- Instant heuristics — task-type detection using your prompt shape and keywords
- ML model — trained on aggregated acceptance signals from all users (gets smarter over time)
- Pattern database — crowd-sourced patterns from millions of real API calls
Everything runs on our servers. You install the SDK, we handle the rest.
Quick Start
Step 1 — Install
pip install llmoptimize
Step 2 — Import before your AI library
import llmoptimize # one line — patches OpenAI, Anthropic, Groq automatically
import openai
client = openai.OpenAI()
# Your existing code — completely unchanged
response = client.chat.completions.create(
model = "gpt-4",
messages = [{"role": "user", "content": "Classify this email as spam or not."}]
)
Step 3 — See your savings
llmoptimize.report()
╔════════════════════════════════════════════════════════════════════╗
║ SMART RECOMMENDATION ║
╚════════════════════════════════════════════════════════════════════╝
🟢 Confidence: 90%
📊 You used: gpt-4 → $0.012400
✨ Switch to: gpt-3.5-turbo → $0.000620
💰 YOU SAVE: $0.011780 (95%)
📈 Quality impact: MINIMAL
💬 Why: Classification task — cheaper models maintain 95%+ accuracy
That's it. No server to run, no dashboard to set up, no config files.
Installation
Requirements
- Python 3.9 or higher
- At least one AI SDK:
openai,anthropic,groq,google-generativeai,mistralai, orcohere
Install
pip install llmoptimize
Optional environment variables
| Variable | Default | What it does |
|---|---|---|
AIOPTIMIZE_SERVER_URL |
Managed cloud | Point to a dedicated instance (enterprise plans) |
AIOPTIMIZE_TIMEOUT |
3 seconds |
Max wait for a recommendation before proceeding |
AIOPTIMIZE_SHARE_DATA |
true |
Opt out of anonymised metadata sharing |
Auto-Tracking (Zero Code Changes)
import llmoptimize silently wraps every AI library you have installed. Your existing code, your existing response objects, your existing error handling — all untouched.
Supported libraries:
| Provider | Library | Chat | Embeddings | Async |
|---|---|---|---|---|
| OpenAI | openai |
✅ | ✅ | ✅ |
| Anthropic | anthropic |
✅ | — | ✅ |
| Groq | groq |
✅ | — | ✅ |
google-generativeai |
✅ | — | ✅ | |
| Mistral | mistralai |
✅ | — | ✅ |
| Cohere | cohere |
✅ | ✅ | — |
Guarantees:
- Your API response is returned exactly as the provider sends it — nothing is modified
- If LLMOptimize encounters any internal error, it fails silently and your call goes through normally
- No added latency on the critical path — tracking and recommendations happen asynchronously
The @track_cost Decorator
For more control, wrap specific functions directly.
Basic tracking
from llmoptimize import track_cost
@track_cost(model="gpt-4")
def classify_ticket(text: str):
return client.chat.completions.create(
model = "gpt-4",
messages = [{"role": "user", "content": text}]
)
Show recommendations before the call
@track_cost(model="gpt-4", smart_suggestions=True)
def analyze_document(text: str):
...
Auto-switch when confident
When auto_optimize=True, the SDK automatically uses the cheaper model when confidence is 90% or higher — no human needed in the loop:
@track_cost(
model = "gpt-4",
smart_suggestions = True,
auto_optimize = True,
)
def batch_classify(items: list):
...
# Console output:
# ✨ Auto-optimized: gpt-4 → gpt-3.5-turbo
# Savings: $0.0114 (92%) | Confidence: 94%
Full decorator options
@track_cost(
model = "gpt-4", # the model your code calls
smart_suggestions = False, # show cheaper alternative before the call
auto_optimize = False, # auto-switch at >= 90% confidence
config = None, # AIOptimizeConfig for better recommendations
enable_guardrails = False, # PII scanning + budget enforcement
daily_budget = None, # float — block calls if daily spend exceeds this
monthly_budget = None, # float — block calls if monthly spend exceeds this
)
Works identically on async def functions with no extra setup.
Configuration
AIOptimizeConfig gives the recommendation engine context about your use case, which improves suggestion accuracy — especially for industry-specific quality tradeoffs.
from llmoptimize import track_cost, AIOptimizeConfig
config = AIOptimizeConfig(
user_id = "your-company-id", # anonymised before it leaves your machine
industry = "healthcare", # tunes quality vs cost tradeoffs
company_size = "startup",
use_case = "summarization",
share_data = True, # helps the model improve for everyone
)
@track_cost(model="gpt-4", smart_suggestions=True, config=config)
def my_function(prompt: str):
...
Config options
| Field | Options | Effect |
|---|---|---|
industry |
saas ecommerce healthcare finance legal education marketing engineering media other |
Adjusts quality sensitivity thresholds |
company_size |
solo startup mid enterprise |
Influences cost vs reliability weighting |
use_case |
customer_support rag content coding analytics translation summarization classification automation chatbot other |
Directly informs task-type detection |
share_data |
True / False |
Whether to contribute anonymised usage to the shared ML model |
share_promptsis alwaysFalseregardless of what you pass. Prompt text never leaves your machine. See Privacy.
Guardrails
Security scanning
Enable guardrails to automatically scan every prompt for sensitive data before it reaches any AI provider.
@track_cost(model="gpt-4", enable_guardrails=True)
def process_user_input(text: str):
...
What gets detected:
| Data type | Action |
|---|---|
| API keys (OpenAI, Anthropic, AWS, etc.) | 🔴 Call blocked |
| Private / cryptographic keys | 🔴 Call blocked |
| Credit card numbers | 🔴 Call blocked |
| Social Security Numbers | 🔴 Call blocked |
| Email addresses | 🟠 Warning shown |
| Phone numbers | 🟠 Warning shown |
When a critical issue is found the call never reaches your AI provider. A detailed report explains exactly what was detected and where.
Budget enforcement
@track_cost(
model = "gpt-4",
enable_guardrails = True,
daily_budget = 10.00,
monthly_budget = 150.00,
)
def my_function(prompt: str):
...
When a call would push you over budget it is blocked before it's made:
❌ BLOCKED: Would exceed daily budget of $10.00
Spent today: $9.94
Remaining: $0.06
Estimated cost of this call: $0.18
Runaway loop protection
If more than 100 calls are detected within any 5-minute window, further calls are blocked automatically and you're alerted. This catches bugs — infinite retry loops, agent run-aways — before they cause a surprise bill.
CLI Audit Tool
Scan any Python file to find AI cost optimisation opportunities without executing it.
llmoptimize audit mycode.py
╔════════════════════════════════════════════════════════════════════╗
║ 🤖 AI CODE AUDIT REPORT ║
╚════════════════════════════════════════════════════════════════════╝
📄 File: mycode.py
📊 ANALYSIS SUMMARY
────────────────────────────────────────────────────────────────────
Total API Calls: 7
Issues Found: 4
Models Used: gpt-4, claude-3-opus-20240229
Est. Monthly Cost: $342.00 (at 1,000 runs/month)
POTENTIAL SAVINGS: $298.00 (87%)
🔍 DETAILED RECOMMENDATIONS
🔴 ISSUE #1: Line 42
You're using: claude-3-opus-20240229
For: Classifying support ticket urgency
✨ SWITCH TO: claude-3-haiku-20240307
Saves: 95% | Quality impact: MINIMAL | Confidence: 90%
CLI commands
# Audit a file (AI-powered analysis, no API key needed)
llmoptimize audit myfile.py
# Rule-based only — completely free, no network call
llmoptimize audit myfile.py --no-ai
# Force fresh analysis (skip cache)
llmoptimize audit myfile.py --force
# One-line summary
llmoptimize audit myfile.py --quiet
# Cache management
llmoptimize stats
llmoptimize clear-cache
Dashboard & Reports
In-code report
import llmoptimize
# ... your application code ...
llmoptimize.report()
Prints a full session breakdown:
════════════════════════════════════════════════════════════════════
📊 SESSION SUMMARY
════════════════════════════════════════════════════════════════════
Total Calls: 284
Total Cost: $4.2180
Total Tokens: 421,800
Avg Cost/Call: $0.014852
Duration: 0:18:42
MODEL BREAKDOWN:
────────────────────────────────────────────────────────────────────
gpt-4:
Calls: 212 Cost: $3.8960 Tokens: 318,000
gpt-3.5-turbo:
Calls: 72 Cost: $0.3220 Tokens: 103,800
════════════════════════════════════════════════════════════════════
Manual tracking
For providers not auto-patched, or for tracking custom inference:
llmoptimize.track(
model = "gpt-4",
prompt_tokens = 400,
completion_tokens = 120,
provider = "openai",
)
Supported Providers & Models
LLMOptimize has pricing data for 50+ models across all major providers. A selection:
OpenAI
gpt-4o · gpt-4o-mini · gpt-4-turbo · gpt-4 · gpt-3.5-turbo · o1 · o1-mini · text-embedding-3-small · text-embedding-3-large
Anthropic
claude-3-5-sonnet-20241022 · claude-3-5-haiku-20241022 · claude-3-opus-20240229 · claude-3-sonnet-20240229 · claude-3-haiku-20240307
Groq
llama-3.3-70b-versatile · llama-3.1-70b-versatile · llama-3.1-8b-instant · gemma2-9b-it · mixtral-8x7b-32768
Google
gemini-1.5-pro · gemini-1.5-flash · gemini-1.0-pro
Mistral
mistral-large-latest · mistral-small-latest · open-mixtral-8x7b
Cohere
command-r-plus · command-r · command-light
Pricing data is kept up to date on the server — the SDK always uses the latest figures without requiring an update.
Privacy
LLMOptimize is built privacy-first. Here is exactly what goes where:
| Data | Stored locally | Sent to server |
|---|---|---|
| Prompt text | ❌ Never | ❌ Never |
Prompt category (e.g. "classification") |
✅ Yes | ✅ Yes |
| Token counts | ✅ Yes | ✅ Yes |
| Cost figures | ✅ Yes | ✅ Yes |
| Model names | ✅ Yes | ✅ Yes |
Your user_id |
SHA-256 hashed | First 16 chars of hash only |
| API keys | Detected in prompts, blocked | ❌ Never |
The guarantee: share_prompts is always False. The code enforces this — it cannot be overridden. Your prompt text is classified locally on your machine and only the resulting label (e.g. "summarization") is ever transmitted.
To opt out of all data sharing entirely:
export AIOPTIMIZE_SHARE_DATA=false
Or in code:
config = AIOptimizeConfig(user_id="me", share_data=False)
Plans & Pricing
| Free | Pro | Enterprise | |
|---|---|---|---|
| Tracked calls / month | 10,000 | Unlimited | Unlimited |
| Recommendation engine | Heuristic | Heuristic + ML | Heuristic + ML + Custom |
| Code audit | 5 files / month | Unlimited | Unlimited |
| Guardrails | ✅ | ✅ | ✅ |
| Auto-optimize | ✅ | ✅ | ✅ |
| Dedicated server instance | ❌ | ❌ | ✅ |
| SSO / SAML | ❌ | ❌ | ✅ |
| SLA | — | 99.9% | 99.99% |
| Support | Community | Dedicated Slack | |
| Price | Free | $49 / month | Contact us |
Get started free → · View full pricing →
FAQ
Does it work with streaming responses? Yes. The SDK intercepts the completed response after streaming finishes and records usage from the final usage block. Your streaming code is unaffected.
Does it add latency to my API calls? No. Tracking and recommendation calls happen after your response is returned — they never sit on your critical path.
What if the recommendation server is unreachable? The SDK falls back to local heuristics instantly and your API call proceeds normally. There is no scenario where an LLMOptimize failure blocks your application.
Does auto_optimize change my prompt or my response?
No. It only changes the model parameter on the API call. The prompt you wrote and the response you receive are identical — just generated by a cheaper model.
Can I use this with a self-hosted or fine-tuned model?
Use llmoptimize.track() to manually record calls to any endpoint. Recommendations won't be available for unknown models, but cost tracking will work.
Is there a usage cap on the free tier? 10,000 tracked calls per month. The SDK continues to work above this limit — recommendations are paused until the next billing cycle.
Does it support LangChain / LlamaIndex? Yes. Both frameworks use the underlying OpenAI / Anthropic SDKs, which are patched automatically on import.
Can I audit files in CI?
# .github/workflows/cost-check.yml
- name: AI cost audit
run: llmoptimize audit src/ --quiet
The CLI exits with code 1 if critical issues are found, making it easy to fail a pipeline.
Support
- Docs: docs.aioptimize.dev
- Email: support@aioptimize.dev
- Status: status.aioptimize.dev
LLMOptimize — spend less, build more.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llmoptimize-3.2.2.tar.gz.
File metadata
- Download URL: llmoptimize-3.2.2.tar.gz
- Upload date:
- Size: 17.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d4edbd3243baf5f72ad889113fef3b05d7c54ed23d13e5f2c5590f385aeb933a
|
|
| MD5 |
0398a559e1fdbf66956d2419c83df529
|
|
| BLAKE2b-256 |
bdfa5a576b246f4ccef3c79181318b03675bc44f840dad99e949f8d9a3b9e2dc
|
File details
Details for the file llmoptimize-3.2.2-py3-none-any.whl.
File metadata
- Download URL: llmoptimize-3.2.2-py3-none-any.whl
- Upload date:
- Size: 15.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cd43e695eebf57af3487359803b0c0bdd6f55cc8b6c2095e364db4f9bc21c414
|
|
| MD5 |
f6bba90e808f7cc13294b0da9b7b8a13
|
|
| BLAKE2b-256 |
7fc3adec791afcb1bc4b18f0daf88e387966b5972b3c3220623bac94edc8aa5a
|