Skip to main content

Reduce LLM costs by 90% - AI recommendations with NO API keys needed!

Project description

LLMOptimize

Cut your AI API costs — automatically. One import. Zero config. No API key required.

pip install llmoptimize

What It Does

LLMOptimize watches every AI API call your code makes and tells you which cheaper model to switch to — and why, in plain English.

  • No API key needed to get recommendations
  • Never touches your prompts or responses — read-only
  • Works with OpenAI, Anthropic, Groq, Gemini, Mistral, Cohere
  • Zero setup — just import it

Quickstart (2 lines)

import llmoptimize          # ← add this at the top

import openai
client = openai.OpenAI()

response = client.chat.completions.create(
    model    = "gpt-4",
    messages = [{"role": "user", "content": "Summarize this article..."}],
)
print(response.choices[0].message.content)   # your real output, unchanged

llmoptimize.report()        # ← add this at the end

That's it. Run your code normally. At the end you'll see:

╔══════════════════════════════════════════════════════════════╗
║                                                              ║
║     🚀  L L M O P T I M I Z E   R E P O R T  🚀            ║
║                                                              ║
║  Your AI Cost Optimization Summary                           ║
╚══════════════════════════════════════════════════════════════╝

📊 YOUR USAGE SUMMARY

🚀  Total API Calls Tracked
   1
   Optimized and analyzed

💰  Total Cost
   $0.0036
   Amount spent on AI API calls

💎  Potential Savings
   $0.0034
   That's 94% less than you could have spent!

────────────────────────────────────────────────────────────

📋 USAGE BY TYPE

  💬 Chat      1 call    $0.0036    → gpt-4o-mini (saves 94%)

────────────────────────────────────────────────────────────

💡 PERSONALIZED RECOMMENDATIONS

╭────────────────────────────────────────────────────────────╮
│ #1 Recommendation                                          │
├────────────────────────────────────────────────────────────┤
│ 🎯 Switch to: gpt-4o-mini                                  │
│ 💰 Save 94% on every call                                  │
│                                                            │
│ 💡 Why switch?                                             │
│   You called gpt-4 1 time this session                     │
│   gpt-4o-mini costs 94% less — saves ~$0.18 per 1,000      │
│   calls at this token size                                 │
│                                                            │
│ ⚡ How to fix:                                              │
│   Change  model="gpt-4"  →  "gpt-4o-mini"                  │
╰────────────────────────────────────────────────────────────╯

Don't Have an API Key Yet? Use Dry-Run

Test your code flow and get cost advice before spending a dollar. Wrap your code with with llmoptimize.report: — it intercepts calls, returns mock responses so your code runs fully, and shows the report on exit.

import llmoptimize
import openai

client = openai.OpenAI(api_key="anything")   # not used in dry-run

with llmoptimize.report:
    # No real API calls — mock responses returned automatically
    client.embeddings.create(
        model = "text-embedding-3-large",
        input = ["RAG systems retrieve relevant documents."],
    )
    client.chat.completions.create(
        model    = "gpt-4",
        messages = [{"role": "user", "content": "Summarize this."}],
    )
# Report prints automatically when the block exits

When you're ready to go live, just remove the with llmoptimize.report: line — your code is already correct.


Track a Specific Task

Use llmoptimize.task() to get a separate report per pipeline stage. Each block gets a clean slate, its own label, and optional dry-run mode.

import llmoptimize
import openai

client = openai.OpenAI()

# Track real costs per stage
with llmoptimize.task("rag-pipeline"):
    chunks  = client.embeddings.create(model="text-embedding-3-large", input=["..."])
    summary = client.chat.completions.create(model="gpt-4", messages=[...])

# Plan costs before shipping — no real API calls
with llmoptimize.task("cost-planning", dry_run=True):
    client.chat.completions.create(model="gpt-4", messages=[...])

Each block prints its own labelled report:

╔══════════════════════════════════════════════════════════════╗
║                                                              ║
║     🚀  L L M O P T I M I Z E   R E P O R T  🚀            ║
║                                                              ║
║  Task: rag-pipeline                                          ║
╚══════════════════════════════════════════════════════════════╝

📌 Task: rag-pipeline

📋 USAGE BY TYPE

  📚 Embedding     1 call     $0.0000    → text-embedding-3-small (saves 80%)
  💬 Chat          1 call     $0.0005    → gpt-4o-mini (saves 99%)
  🧠 Reasoning     — not used this session

CLI — Audit a File Before Running It

No code changes needed. Point it at any Python file and get instant advice:

llmoptimize audit mycode.py
╔════════════════════════════════════════════════════════════════╗
║                   🤖 AI CODE AUDIT REPORT                     ║
╚════════════════════════════════════════════════════════════════╝

📄 File: mycode.py

📊 SUMMARY
   API calls found:    7
   Issues detected:    4
   Models used:        gpt-4, claude-3-opus

   Est. monthly cost:  $342  (at 1,000 runs/month)
   Potential savings:  $298  (87%)

🔍 RECOMMENDATIONS

🔴 Line 42 — claude-3-opus
   Switch to: claude-3-5-haiku  |  saves 95%
   Why: You're using claude-3-opus ($90/1M tokens). For ticket
   classification claude-3-5-haiku costs $4.80/1M — same accuracy,
   18x cheaper.

Options:

llmoptimize audit mycode.py             # full report
llmoptimize audit mycode.py --quiet     # one-line summary
llmoptimize audit mycode.py --force     # skip cache, always re-analyze
llmoptimize stats                       # show cache statistics
llmoptimize clear-cache                 # clear cached results

Supported Providers

import llmoptimize automatically patches every AI library you have installed. Nothing else needed.

Provider Library Chat Embeddings
OpenAI openai
Anthropic anthropic
Groq groq
Google Gemini google-generativeai
Mistral mistralai
Cohere cohere

Pricing data for 60+ models including OpenAI, Anthropic, Groq, Gemini, Mistral, Cohere, Voyage AI, Jina AI, and AWS Bedrock.


How Recommendations Work

Recommendations are never just the cheapest model. The engine checks capability tiers so you only see alternatives that deliver comparable results:

Tier Examples
Frontier gpt-4, claude-3-opus, o1
Strong gpt-4o, claude-3-5-sonnet, gemini-1.5-pro
Capable gpt-4o-mini, claude-3-haiku, gemini-1.5-flash
Lightweight gemini-1.5-flash-8b, llama-3.1-8b-instant

It only recommends models at most one tier below what you're using — never a dramatic quality drop.

How reasoning is generated:

  1. Pricing tables identify which model to switch to and savings %
  2. AI analysis (Groq, on our server — no key needed from you) explains why in plain English
  3. If the server is unreachable, cached reasoning from previous sessions is used
  4. Final fallback: friendly plain-English text computed from pricing data

Free Tier & License

LLMOptimize includes 500 free tracked calls per machine.

🎉 Upgrade to continue:
   llmoptimize activate YOUR_LICENSE_KEY

Activate a paid license

llmoptimize activate llmopt-xxxxxxxxxxxx
# ✅ License activated!  Plan: starter  |  500 calls/month
#    Valid through: 2026-04

The key is validated online and stored locally at ~/.aioptimize/license.json. No environment variables needed. Works for all future sessions on this machine.

Remove a license

llmoptimize deactivate
# Remove license llmopt-xxxx...? (y/N): y
# ✅ License removed. Free tier limits restored.

For servers / containers

export AIOPTIMIZE_LICENSE_KEY="llmopt-xxxxxxxxxxxx"

Manual Tracking

For custom or self-hosted models not auto-patched:

llmoptimize.track(
    model             = "my-custom-model",
    prompt_tokens     = 400,
    completion_tokens = 120,
    provider          = "custom",
)

llmoptimize.report()

Session Management

llmoptimize.new_session()              # clear tracking, start fresh
llmoptimize.report(interactive=False)  # no menu prompt — useful in scripts

Privacy

Data Stored locally Sent to server
Your prompt text Never Never
Token counts Yes Yes
Model names Yes Yes
Cost figures Yes Yes
API keys Never stored Never sent

Prompt text never leaves your machine. To disable server tracking entirely:

export AIOPTIMIZE_SERVER_URL=""

FAQ

Do I need to configure anything? No. import llmoptimize is all the setup required.

Will it slow down my app? No. Tracking happens after your response is returned and never blocks the critical path.

What if the recommendation server is unreachable? It falls back to local pricing data instantly. Your app is never affected.

Does it work with LangChain / LlamaIndex? Yes — both use the underlying OpenAI/Anthropic SDKs which are patched automatically.

Does it work with streaming? Yes. Token counts are recorded from the final usage block after streaming completes.

Can I use it without an API key at all? Yes — use dry_run=True or with llmoptimize.report:. Your code runs end-to-end with mock responses. No API key, no cost, full recommendations.

What's the difference between task() and report()? task("name") resets the session first (clean slate) and labels the output. report() shows everything tracked since the last reset. Use task() when benchmarking specific pipeline stages.


LLMOptimize v3.2.4 — spend less, build more.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmoptimize-3.2.6.tar.gz (25.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llmoptimize-3.2.6-py3-none-any.whl (25.1 kB view details)

Uploaded Python 3

File details

Details for the file llmoptimize-3.2.6.tar.gz.

File metadata

  • Download URL: llmoptimize-3.2.6.tar.gz
  • Upload date:
  • Size: 25.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for llmoptimize-3.2.6.tar.gz
Algorithm Hash digest
SHA256 3557659aeb50cc29e68fd22ffdac610862b3bbeaaf58d31b72af1d591070a4ba
MD5 584e19d55b585ff7a6a38f69ae5bd403
BLAKE2b-256 b6649f674bdc35b45bea75fa270fe6ad5aec7e09b76e88fa075a636deef9e4dc

See more details on using hashes here.

File details

Details for the file llmoptimize-3.2.6-py3-none-any.whl.

File metadata

  • Download URL: llmoptimize-3.2.6-py3-none-any.whl
  • Upload date:
  • Size: 25.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for llmoptimize-3.2.6-py3-none-any.whl
Algorithm Hash digest
SHA256 dba40d948c43fe56bd5d3d957cd97a1a1ace0b2ffa818f24115af0cb2f05efd2
MD5 aba691da694cf3ecd9d0b2b5625a878c
BLAKE2b-256 303c9cb4ae7ac0aad609b57d5e3de42d5f600041d51bf421d2a2d9ba258ae251

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page