Reduce LLM costs by 90% - AI recommendations with NO API keys needed!

These details have not been verified by PyPI

Project links

Project description

LLMOptimize

Cut your AI API costs — automatically. One import. Zero config. No API key required.

pip install llmoptimize

What It Does

LLMOptimize is a complete AI cost optimization SDK that silently watches every AI API call your code makes and surfaces actionable savings — without ever touching your prompts or responses.

Zero setup — just import llmoptimize
No API key needed for recommendations
Never reads your prompt text — only token counts and model names
Works with OpenAI, Anthropic, Groq and any framework built on them (LangChain, CrewAI, LlamaIndex, etc.)

What it analyzes:

Feature	Description
💰 Cost tracking	Real token usage → exact cost per call
💡 Model recommendations	Heuristic + ML engine finds cheaper alternatives
🔁 Loop detection	Catches agent loops before they drain your budget
📚 RAG pattern detection	Identifies RAG pipelines and embedding savings
⚡ Cache opportunities	Finds repeated prompts that should be cached
🧠 ML model	Learns from your usage and improves over time
🤖 Agent workflow	Multi-step tracking, context growth, step analytics
📏 Context optimizer	Detects context window growth, compression tips
🛡️ Security guardrails	Flags if API keys or sensitive data appear in prompts

Quickstart (2 lines)

import llmoptimize          # ← add this at the top

import openai
client = openai.OpenAI()

response = client.chat.completions.create(
    model    = "gpt-4",
    messages = [{"role": "user", "content": "Summarize this article..."}],
)
print(response.choices[0].message.content)   # your real output, unchanged

llmoptimize.report()        # ← add this at the end

That's it. Your code runs exactly as before. At the end you'll see a full cost + optimization report.

What the Report Shows

Running llmoptimize.report() produces a full interactive terminal report covering:

╔══════════════════════════════════════════════════════════════╗
║     🚀  L L M O P T I M I Z E   R E P O R T  🚀            ║
╚══════════════════════════════════════════════════════════════╝

📊 YOUR USAGE SUMMARY
  🚀 Total API Calls Tracked    3
  💰 Total Cost                 $0.0041
  💎 Potential Savings          $0.0039  (94% less!)

📋 USAGE BY TYPE
  💬 Chat        2 calls    $0.0040    → gpt-4o-mini (saves 94%)
  📚 Embedding   1 call     $0.0001    → text-embedding-3-small (saves 80%)

💡 PERSONALIZED RECOMMENDATIONS
  ╭────────────────────────────────────────────────────╮
  │ #1 Switch to: gpt-4o-mini   💰 Save 94%           │
  │ Why: You called gpt-4 2x — gpt-4o-mini costs 94%  │
  │      less, saves ~$0.18 per 1,000 calls            │
  │ Fix: model="gpt-4"  →  "gpt-4o-mini"              │
  ╰────────────────────────────────────────────────────╯

🤖 AGENT WORKFLOW ANALYSIS
  Steps tracked:         3
  Models used:           2 (multi-model workflow)
  Avg tokens/step:       420
  Context growth:        1.2x
  💡 Multi-model workflow — heuristic engine can recommend
     the cheapest model per step automatically

⚡ CACHING OPPORTUNITIES
  Found 1 repeated prompt pattern — caching could save ~$0.0008

🧠 ML MODEL STATUS
  ML collecting training data (12 samples — activates after 50+)
  Track more calls to unlock ML-powered model selection

📏 CONTEXT WINDOW OPTIMIZER
  Prompt sizes stable (avg 380 tokens) — context is well-managed

🔧 SDK UTILITIES
  llmoptimize.select_model(code)      → pick cheapest Groq model
  llmoptimize.check_loop(actions)     → detect agent loops
  llmoptimize.analyze(prompt, model)  → instant recommendation

Dry-Run Mode — Plan Costs Before Spending

Test your full code flow and get savings advice before spending a dollar. Wrap your code with with llmoptimize.report: — real API calls are intercepted, mock responses are returned so your code runs fully, and the report prints on exit.

import llmoptimize
import openai

client = openai.OpenAI(api_key="anything")   # not used in dry-run

with llmoptimize.report:
    # No real API calls — mock responses returned automatically
    client.embeddings.create(
        model = "text-embedding-3-large",
        input = ["RAG systems retrieve relevant documents."],
    )
    client.chat.completions.create(
        model    = "gpt-4",
        messages = [{"role": "user", "content": "Summarize this."}],
    )
# Report prints automatically when the block exits

When you're ready to go live, just remove the with llmoptimize.report: line — your code is already correct.

Named Task Sessions

Use llmoptimize.task() to get a separate labelled report per pipeline stage. Each block gets a clean slate, its own label, and optional dry-run mode.

import llmoptimize
import openai

client = openai.OpenAI()

# Track real costs per stage
with llmoptimize.task("rag-pipeline"):
    chunks  = client.embeddings.create(model="text-embedding-3-large", input=["..."])
    summary = client.chat.completions.create(model="gpt-4", messages=[...])

# Plan costs before shipping — no real API calls
with llmoptimize.task("cost-planning", dry_run=True):
    client.chat.completions.create(model="gpt-4", messages=[...])

SDK Utility Functions

These work anywhere in your code — no extra setup, no API key.

Instant Model Recommendation

result = llmoptimize.analyze(
    prompt = "Classify this support ticket as urgent or normal: ...",
    model  = "gpt-4o",
)

# result["recommendation"]["suggested_model"]           → "gpt-4o-mini"
# result["recommendation"]["estimated_savings_percent"] → 96
# result["recommendation"]["reasoning"]                 → "Classification task — cheaper models maintain 95%+ accuracy"

Smart Model Selector for Code Tasks

Automatically picks the cheapest Groq model that can handle your code complexity. Saves up to 84% vs always using the 70B model.

result = llmoptimize.select_model("""
    Extract the user name and email from this JSON string.
    Return as a Python dict.
""")

# result["selected_model"]                  → "llama-3.1-8b-instant"
# result["complexity_level"]                → "simple"
# result["model_info"]["best_for"]          → "Simple scripts, single API calls, basic logic"
# result["vs_heavy_model"]["savings_pct"]   → 84
# result["vs_heavy_model"]["message"]       → "Simple task — llama-3.1-8b-instant is 84% cheaper..."

Complexity	Model Selected	Use Case
`simple`	`llama-3.1-8b-instant`	Single API calls, basic logic, extraction
`medium`	`openai/gpt-oss-20b`	Multi-step workflows, moderate complexity
`complex`	`llama-3.3-70b-versatile`	Complex agents, advanced reasoning

Agent Loop Detection

Catches repetitive agent behavior before it drains your budget.

result = llmoptimize.check_loop([
    "search web for python docs",
    "read python docs",
    "search web for python docs",   # repeated!
    "read python docs",
    "search web for python docs",   # repeated again!
])

# result["loop_detected"]              → True
# result["loops"][0]["pattern_type"]   → "exact_repeat"
# result["loops"][0]["severity"]       → "warning"
# result["loops"][0]["recommendation"] → "Add a stop condition..."
# result["message"]                    → "⚠️ 1 loop pattern detected..."

Detects:

Exact repeats — same action 3+ times
Circular patterns — A→B→C→A→B→C
Alternating loops — A→B→A→B

LangChain & CrewAI

No changes needed to your agents or chains. Just add import llmoptimize at the top — all LLM calls inside chains and agents are tracked automatically.

import llmoptimize       # ← one line at the top

from langchain_openai import ChatOpenAI
from langchain.chains import LLMChain

llm   = ChatOpenAI(model="gpt-4")
chain = LLMChain(llm=llm, prompt=my_prompt)
chain.invoke({"input": "..."})

llmoptimize.report()    # see exactly what the chain spent and how to cut it

import llmoptimize       # ← one line at the top

from crewai import Agent, Task, Crew

researcher = Agent(role="Researcher", llm="gpt-4", ...)
crew = Crew(agents=[researcher], tasks=[...])
crew.kickoff()

llmoptimize.report()    # full agent workflow analysis included

CLI — Audit a File Before Running It

No code changes needed. Point it at any Python file:

llmoptimize audit mycode.py

╔════════════════════════════════════════════════════════════════╗
║                   🤖 AI CODE AUDIT REPORT                     ║
╚════════════════════════════════════════════════════════════════╝

📄 File: mycode.py

📊 SUMMARY
   API calls found:    7
   Issues detected:    4
   Models used:        gpt-4, claude-3-opus

   Est. monthly cost:  $342  (at 1,000 runs/month)
   Potential savings:  $298  (87%)

🔍 RECOMMENDATIONS

🔴 Line 42 — claude-3-opus
   Switch to: claude-3-5-haiku  |  saves 95%
   Why: Classification task — claude-3-5-haiku is 18x cheaper
        with comparable accuracy for structured output.

Options:

llmoptimize audit mycode.py             # full report
llmoptimize audit mycode.py --quiet     # one-line summary
llmoptimize audit mycode.py --force     # skip cache, always re-analyze
llmoptimize stats                       # show cache statistics
llmoptimize clear-cache                 # clear cached results

Supported Providers

import llmoptimize automatically patches every AI library you have installed.

Provider	Library	Chat	Embeddings
OpenAI	`openai`	✅	✅
Anthropic	`anthropic`	✅	—
Groq	`groq`	✅	—

Pricing data for 60+ models: OpenAI, Anthropic, Groq, Gemini, Mistral, Cohere, Voyage AI, Jina AI, AWS Bedrock.

How Recommendations Work

Recommendations use a 3-layer engine — never just the cheapest model:

1. Heuristic engine (5ms)
   ↓ Keyword-based task detection (classification → gpt-4o-mini, etc.)

2. ML model (10ms)
   ↓ Trained on real accept/reject decisions from all SDK users
   ↓ Learns which models work best per prompt category + complexity

3. Crowd-sourced patterns (instant)
   ↓ Global anonymised data: which model won for this task type?

Capability tiers are respected — you'll never see a recommendation that drops more than one quality tier:

Tier	Examples
Frontier	`gpt-4`, `claude-3-opus`, `o1`
Strong	`gpt-4o`, `claude-3-5-sonnet`, `gemini-1.5-pro`
Capable	`gpt-4o-mini`, `claude-3-haiku`, `gemini-1.5-flash`
Lightweight	`gemini-1.5-flash-8b`, `llama-3.1-8b-instant`

Session Management

llmoptimize.new_session()              # clear tracking, start fresh
llmoptimize.report(interactive=False)  # no menu prompt — useful in scripts

Jupyter / VS Code Interactive Window note: The Python kernel stays alive between cells, so llmoptimize accumulates calls across all cells. Call llmoptimize.new_session() before each test run:
import llmoptimize
llmoptimize.new_session()   # ← reset before testing a new model

resp = client.embeddings.create(model="text-embedding-3-small", input=texts)
llmoptimize.report()
Regular .py scripts reset automatically on each run.

Manual Tracking

For custom or self-hosted models not auto-patched:

llmoptimize.track(
    model             = "my-custom-model",
    prompt_tokens     = 400,
    completion_tokens = 120,
    provider          = "custom",
)

llmoptimize.report()

Free Tier & License

LLMOptimize includes 500 free tracked calls per machine.

Activate a paid license

llmoptimize activate llmopt-xxxxxxxxxxxx
# ✅ License activated!  Plan: starter  |  500 calls/month
#    Valid through: 2026-04

Remove a license

llmoptimize deactivate
# ✅ License removed. Free tier limits restored.

For servers / containers

export AIOPTIMIZE_LICENSE_KEY="llmopt-xxxxxxxxxxxx"

Privacy

Data	Stored locally	Sent to server
Your prompt text	Never	Never
Token counts	Yes	Yes (anonymised)
Model names	Yes	Yes
Cost figures	Yes	Yes
API keys	Never stored	Never sent

Only the first 100 chars of your prompt are optionally sent for category classification (e.g. "classification" vs "summarization") — never the full text, never stored.

To disable server tracking entirely:

export AIOPTIMIZE_SERVER_URL=""

Full API Reference

# ── Core ──────────────────────────────────────────────────────
import llmoptimize

llmoptimize.report()                   # interactive report
llmoptimize.report(interactive=False)  # plain text, no menu
llmoptimize.new_session()              # reset all tracking
llmoptimize.track(model, prompt_tokens, completion_tokens)

# ── Dry-run / named sessions ──────────────────────────────────
with llmoptimize.report:               # dry-run + report on exit
    ...

with llmoptimize.report(interactive=False):
    ...

with llmoptimize.task("pipeline-name"):       # named session, live calls
    ...

with llmoptimize.task("plan", dry_run=True):  # dry-run + labelled report
    ...

# ── Smart utilities ───────────────────────────────────────────
result = llmoptimize.analyze(prompt, model)   # instant recommendation
result = llmoptimize.select_model(code)       # pick cheapest Groq model
result = llmoptimize.check_loop(actions)      # detect agent loops

FAQ

Do I need to configure anything? No. import llmoptimize is all the setup required.

Will it slow down my app? No. Tracking happens after your response is returned and never blocks the critical path. All server calls are fire-and-forget.

What if the recommendation server is unreachable? It falls back to local pricing data instantly. Your app is never affected.

Does it work with LangChain / LlamaIndex / CrewAI? Yes — they all use the underlying OpenAI/Anthropic/Groq SDKs which are patched automatically.

Does it work with streaming? Yes. Token counts are recorded from the final usage block after streaming completes.

Can I use it without an API key at all? Yes — use dry_run=True or with llmoptimize.report:. Your code runs end-to-end with mock responses. No API key, no cost, full recommendations.

What's the difference between task() and report()? task("name") resets the session first and labels the output. report() shows everything tracked since the last reset. Use task() when benchmarking specific pipeline stages.

What does select_model() do? It sends your code/task description to the server's GroqModelSelector — a hybrid rule-based + caching engine that picks the cheapest Groq model capable of handling the complexity. No API key needed.

What does check_loop() do? It sends a list of action strings to the server's LoopDetector, which runs rule-based checks for exact repeats, circular patterns (A→B→C→A), and alternating loops. Returns which steps are looping and a recommendation to fix it.

LLMOptimize v3.3.0 — spend less, build more.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

3.3.0

Mar 4, 2026

3.2.8

Mar 4, 2026

3.2.7

Mar 4, 2026

3.2.6

Mar 4, 2026

3.2.5

Mar 4, 2026

3.2.4

Mar 4, 2026

3.2.3

Mar 4, 2026

3.2.2

Mar 3, 2026

3.2.1

Mar 2, 2026

3.2.0

Mar 2, 2026

3.1.0

Mar 2, 2026

2.1.1

Feb 28, 2026

2.1.0

Feb 28, 2026

1.0.6

Feb 27, 2026

1.0.5

Feb 24, 2026

1.0.4

Feb 24, 2026

1.0.3

Feb 24, 2026

1.0.2

Feb 24, 2026

1.0.1

Feb 24, 2026

1.0.0

Feb 24, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmoptimize-3.3.0.tar.gz (31.2 kB view details)

Uploaded Mar 4, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llmoptimize-3.3.0-py3-none-any.whl (30.9 kB view details)

Uploaded Mar 4, 2026 Python 3

File details

Details for the file llmoptimize-3.3.0.tar.gz.

File metadata

Download URL: llmoptimize-3.3.0.tar.gz
Upload date: Mar 4, 2026
Size: 31.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for llmoptimize-3.3.0.tar.gz
Algorithm	Hash digest
SHA256	`e8dfd1576f1805bfd05af1a660b0962c2a62217c685e4c55124e69fdf64d4ed0`
MD5	`ef6d24ca259ffff20fcd1e681426a42f`
BLAKE2b-256	`096ebb0991a835217c86f68c693fdd3b12cd8dd943a810bdcc0b542c15f32751`

See more details on using hashes here.

File details

Details for the file llmoptimize-3.3.0-py3-none-any.whl.

File metadata

Download URL: llmoptimize-3.3.0-py3-none-any.whl
Upload date: Mar 4, 2026
Size: 30.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for llmoptimize-3.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6306e40e8edb09fa6e77043c5e1d11713d91a2f209f5cfb601a44251830750d1`
MD5	`b9b4d407290fe4861f6bd7198da93008`
BLAKE2b-256	`786502a36d8d2c8cd1ea3fa5d390ddd291f10d3d882729d02fc325ed4f43650c`

See more details on using hashes here.

llmoptimize 3.3.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

LLMOptimize

What It Does

Quickstart (2 lines)

What the Report Shows

Dry-Run Mode — Plan Costs Before Spending

Named Task Sessions

SDK Utility Functions

Instant Model Recommendation

Smart Model Selector for Code Tasks

Agent Loop Detection

LangChain & CrewAI

CLI — Audit a File Before Running It

Supported Providers

How Recommendations Work

Session Management

Manual Tracking

Free Tier & License

Activate a paid license

Remove a license

For servers / containers

Privacy

Full API Reference

FAQ

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes