Skip to main content

Reduce LLM costs by 90% - AI recommendations with NO API keys needed!

Project description

LLMOptimize

Cut your AI API costs — automatically. One import. Zero config. No API key required.

pip install llmoptimize

What It Does

LLMOptimize watches every AI API call your code makes and tells you which cheaper model to switch to — and why, in plain English.

  • No API key needed to get recommendations
  • Never touches your prompts or responses — read-only
  • Works with OpenAI, Anthropic, Groq, Gemini, Mistral, Cohere
  • Zero setup — just import it

Quickstart (2 lines)

import llmoptimize          # ← add this at the top

import openai
client = openai.OpenAI()

response = client.chat.completions.create(
    model    = "gpt-4",
    messages = [{"role": "user", "content": "Summarize this article..."}],
)
print(response.choices[0].message.content)   # your real output, unchanged

llmoptimize.report()        # ← add this at the end

That's it. Run your code normally. At the end you'll see:

╔══════════════════════════════════════════════════════════════╗
║           🚀  LLMOPTIMIZE REPORT                             ║
╚══════════════════════════════════════════════════════════════╝

📊 YOUR USAGE SUMMARY
   Total Calls:      1
   Total Cost:       $0.0036
   Potential Savings: $0.0034  (94% cheaper!)

📋 USAGE BY TYPE
  💬 Chat      1 call    $0.0036    → gpt-4o-mini (saves 94%)

💡 TOP RECOMMENDATION
   Switch model="gpt-4"  →  "gpt-4o-mini"
   You're using gpt-4 ($60/1M tokens). For most chat tasks
   gpt-4o-mini gives the same results at $0.75/1M tokens —
   that's 80x cheaper.

Don't Have an API Key Yet? Use Dry-Run

Test your code flow and get cost advice before spending a dollar. Wrap your code with with llmoptimize.report: — it intercepts calls, returns mock responses so your code runs fully, and shows the report on exit.

import llmoptimize
import openai

client = openai.OpenAI(api_key="anything")   # not used in dry-run

with llmoptimize.report:
    # No real API calls — mock responses returned automatically
    client.embeddings.create(
        model = "text-embedding-3-large",
        input = ["RAG systems retrieve relevant documents."],
    )
    client.chat.completions.create(
        model    = "gpt-4",
        messages = [{"role": "user", "content": "Summarize this."}],
    )
# Report prints automatically when the block exits

When you're ready to go live, just remove the with llmoptimize.report: line — your code is already correct.


Track a Specific Task

Use llmoptimize.task() to get a separate report per pipeline stage. Each block gets a clean slate, its own label, and optional dry-run mode.

import llmoptimize
import openai

client = openai.OpenAI()

# Track real costs per stage
with llmoptimize.task("rag-pipeline"):
    chunks  = client.embeddings.create(model="text-embedding-3-large", input=["..."])
    summary = client.chat.completions.create(model="gpt-4", messages=[...])

# Plan costs before shipping — no real API calls
with llmoptimize.task("cost-planning", dry_run=True):
    client.chat.completions.create(model="gpt-4", messages=[...])

Each block prints its own labelled report:

📋 TASK: rag-pipeline

  📚 Embedding     8 calls    $0.0002    → text-embedding-3-small (saves 85%)
  💬 Chat          2 calls    $0.0180    → gpt-4o-mini (saves 94%)
  🧠 Reasoning     — not used this session

CLI — Audit a File Before Running It

No code changes needed. Point it at any Python file and get instant advice:

llmoptimize audit mycode.py
╔════════════════════════════════════════════════════════════════╗
║                   🤖 AI CODE AUDIT REPORT                     ║
╚════════════════════════════════════════════════════════════════╝

📄 File: mycode.py

📊 SUMMARY
   API calls found:    7
   Issues detected:    4
   Models used:        gpt-4, claude-3-opus

   Est. monthly cost:  $342  (at 1,000 runs/month)
   Potential savings:  $298  (87%)

🔍 RECOMMENDATIONS

🔴 Line 42 — claude-3-opus
   Switch to: claude-3-5-haiku  |  saves 95%
   Why: You're using claude-3-opus ($90/1M tokens). For ticket
   classification claude-3-5-haiku costs $4.80/1M — same accuracy,
   18x cheaper.

Options:

llmoptimize audit mycode.py             # full report
llmoptimize audit mycode.py --quiet     # one-line summary
llmoptimize audit mycode.py --force     # skip cache, always re-analyze
llmoptimize stats                       # show cache statistics
llmoptimize clear-cache                 # clear cached results

Supported Providers

import llmoptimize automatically patches every AI library you have installed. Nothing else needed.

Provider Library Chat Embeddings
OpenAI openai
Anthropic anthropic
Groq groq
Google Gemini google-generativeai
Mistral mistralai
Cohere cohere

Pricing data for 60+ models including OpenAI, Anthropic, Groq, Gemini, Mistral, Cohere, Voyage AI, Jina AI, and AWS Bedrock.


How Recommendations Work

Recommendations are never just the cheapest model. The engine checks capability tiers so you only see alternatives that deliver comparable results:

Tier Examples
Frontier gpt-4, claude-3-opus, o1
Strong gpt-4o, claude-3-5-sonnet, gemini-1.5-pro
Capable gpt-4o-mini, claude-3-haiku, gemini-1.5-flash
Lightweight gemini-1.5-flash-8b, llama-3.1-8b-instant

It only recommends models at most one tier below what you're using — never a dramatic quality drop.

How reasoning is generated:

  1. Pricing tables identify which model to switch to and savings %
  2. AI analysis (Groq, on our server — no key needed from you) explains why in plain English
  3. If the server is unreachable, cached reasoning from previous sessions is used
  4. Final fallback: friendly plain-English text computed from pricing data

Free Tier & License

LLMOptimize includes 200 free tracked calls per machine.

🎉 Upgrade to continue:
   llmoptimize activate YOUR_LICENSE_KEY

Activate a paid license

llmoptimize activate llmopt-xxxxxxxxxxxx
# ✅ License activated!  Plan: starter  |  500 calls/month
#    Valid through: 2026-04

The key is validated online and stored locally at ~/.aioptimize/license.json. No environment variables needed. Works for all future sessions on this machine.

Remove a license

llmoptimize deactivate
# Remove license llmopt-xxxx...? (y/N): y
# ✅ License removed. Free tier limits restored.

For servers / containers

export AIOPTIMIZE_LICENSE_KEY="llmopt-xxxxxxxxxxxx"

Manual Tracking

For custom or self-hosted models not auto-patched:

llmoptimize.track(
    model             = "my-custom-model",
    prompt_tokens     = 400,
    completion_tokens = 120,
    provider          = "custom",
)

llmoptimize.report()

Session Management

llmoptimize.new_session()              # clear tracking, start fresh
llmoptimize.report(interactive=False)  # no menu prompt — useful in scripts

Privacy

Data Stored locally Sent to server
Your prompt text Never Never
Token counts Yes Yes
Model names Yes Yes
Cost figures Yes Yes
API keys Never stored Never sent

Prompt text never leaves your machine. To disable server tracking entirely:

export AIOPTIMIZE_SERVER_URL=""

FAQ

Do I need to configure anything? No. import llmoptimize is all the setup required.

Will it slow down my app? No. Tracking happens after your response is returned and never blocks the critical path.

What if the recommendation server is unreachable? It falls back to local pricing data instantly. Your app is never affected.

Does it work with LangChain / LlamaIndex? Yes — both use the underlying OpenAI/Anthropic SDKs which are patched automatically.

Does it work with streaming? Yes. Token counts are recorded from the final usage block after streaming completes.

Can I use it without an API key at all? Yes — use dry_run=True or with llmoptimize.report:. Your code runs end-to-end with mock responses. No API key, no cost, full recommendations.

What's the difference between task() and report()? task("name") resets the session first (clean slate) and labels the output. report() shows everything tracked since the last reset. Use task() when benchmarking specific pipeline stages.


LLMOptimize v3.2.2 — spend less, build more.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmoptimize-3.2.3.tar.gz (21.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llmoptimize-3.2.3-py3-none-any.whl (21.5 kB view details)

Uploaded Python 3

File details

Details for the file llmoptimize-3.2.3.tar.gz.

File metadata

  • Download URL: llmoptimize-3.2.3.tar.gz
  • Upload date:
  • Size: 21.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for llmoptimize-3.2.3.tar.gz
Algorithm Hash digest
SHA256 c74fb5bfe81ad8b2dd67c6c3e9b58706b8f4f3dc239f866baf9c37dcf8b1ba02
MD5 a949ad2c643da1b70d61dbb742bb5e69
BLAKE2b-256 47965729ffe56c16a0930d5358aee43106bba1e5eadcf6085af4a98d002e7e95

See more details on using hashes here.

File details

Details for the file llmoptimize-3.2.3-py3-none-any.whl.

File metadata

  • Download URL: llmoptimize-3.2.3-py3-none-any.whl
  • Upload date:
  • Size: 21.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for llmoptimize-3.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 e1cdc4dc8b9ba5ba1f95507f3c9f948dbdf9494e8486c974c1f18592bf1397cc
MD5 70cd9463ee6ced2ca6de385579599344
BLAKE2b-256 2ace5c36f1568e88f88258d9081d5721442d0fe3ee928418edd6d30023131547

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page