Skip to main content

Reduce LLM costs by 90% - AI recommendations with NO API keys needed!

Project description

LLMOptimize

Cut your AI API costs by 40–97% — automatically. One import. Zero prompt changes. No infrastructure to run.

pip install llmoptimize
import llmoptimize   # done — every AI call is now tracked and optimised

Table of Contents


What It Does

LLMOptimize monitors every AI API call your application makes and tells you when a cheaper model would do the same job just as well.

Your Code  →  LLMOptimize SDK  →  Your AI Provider (OpenAI / Anthropic / ...)
                    │
                    ▼
            Recommendation Engine
            (hosted — nothing to run)
                    │
                    ▼
         "Use gpt-3.5-turbo instead.
          95% cheaper. Minimal quality
          impact. 90% confident."

The recommendation engine has three layers that run in order:

  1. Instant heuristics — task-type detection using your prompt shape and keywords
  2. ML model — trained on aggregated acceptance signals from all users (gets smarter over time)
  3. Pattern database — crowd-sourced patterns from millions of real API calls

Everything runs on our servers. You install the SDK, we handle the rest.


Quick Start

Step 1 — Install

pip install llmoptimize

Step 2 — Import before your AI library

import llmoptimize          # one line — patches OpenAI, Anthropic, Groq automatically

import openai
client = openai.OpenAI()

# Your existing code — completely unchanged
response = client.chat.completions.create(
    model    = "gpt-4",
    messages = [{"role": "user", "content": "Classify this email as spam or not."}]
)

Step 3 — See your savings

llmoptimize.report()
╔════════════════════════════════════════════════════════════════════╗
║                      SMART RECOMMENDATION                        ║
╚════════════════════════════════════════════════════════════════════╝

🟢 Confidence: 90%

📊 You used:   gpt-4          →  $0.012400
✨ Switch to:  gpt-3.5-turbo  →  $0.000620

💰 YOU SAVE:   $0.011780  (95%)
📈 Quality impact: MINIMAL

💬 Why: Classification task — cheaper models maintain 95%+ accuracy

That's it. No server to run, no dashboard to set up, no config files.


Installation

Requirements

  • Python 3.9 or higher
  • At least one AI SDK: openai, anthropic, groq, google-generativeai, mistralai, or cohere

Install

pip install llmoptimize

Optional environment variables

Variable Default What it does
AIOPTIMIZE_SERVER_URL Managed cloud Point to a dedicated instance (enterprise plans)
AIOPTIMIZE_TIMEOUT 3 seconds Max wait for a recommendation before proceeding
AIOPTIMIZE_SHARE_DATA true Opt out of anonymised metadata sharing

Auto-Tracking (Zero Code Changes)

import llmoptimize silently wraps every AI library you have installed. Your existing code, your existing response objects, your existing error handling — all untouched.

Supported libraries:

Provider Library Chat Embeddings Async
OpenAI openai
Anthropic anthropic
Groq groq
Google google-generativeai
Mistral mistralai
Cohere cohere

Guarantees:

  • Your API response is returned exactly as the provider sends it — nothing is modified
  • If LLMOptimize encounters any internal error, it fails silently and your call goes through normally
  • No added latency on the critical path — tracking and recommendations happen asynchronously

The @track_cost Decorator

For more control, wrap specific functions directly.

Basic tracking

from llmoptimize import track_cost

@track_cost(model="gpt-4")
def classify_ticket(text: str):
    return client.chat.completions.create(
        model    = "gpt-4",
        messages = [{"role": "user", "content": text}]
    )

Show recommendations before the call

@track_cost(model="gpt-4", smart_suggestions=True)
def analyze_document(text: str):
    ...

Auto-switch when confident

When auto_optimize=True, the SDK automatically uses the cheaper model when confidence is 90% or higher — no human needed in the loop:

@track_cost(
    model             = "gpt-4",
    smart_suggestions = True,
    auto_optimize     = True,
)
def batch_classify(items: list):
    ...

# Console output:
# ✨ Auto-optimized: gpt-4 → gpt-3.5-turbo
#    Savings: $0.0114 (92%)  |  Confidence: 94%

Full decorator options

@track_cost(
    model              = "gpt-4",        # the model your code calls
    smart_suggestions  = False,          # show cheaper alternative before the call
    auto_optimize      = False,          # auto-switch at >= 90% confidence
    config             = None,           # AIOptimizeConfig for better recommendations
    enable_guardrails  = False,          # PII scanning + budget enforcement
    daily_budget       = None,           # float — block calls if daily spend exceeds this
    monthly_budget     = None,           # float — block calls if monthly spend exceeds this
)

Works identically on async def functions with no extra setup.


Configuration

AIOptimizeConfig gives the recommendation engine context about your use case, which improves suggestion accuracy — especially for industry-specific quality tradeoffs.

from llmoptimize import track_cost, AIOptimizeConfig

config = AIOptimizeConfig(
    user_id      = "your-company-id",    # anonymised before it leaves your machine
    industry     = "healthcare",         # tunes quality vs cost tradeoffs
    company_size = "startup",
    use_case     = "summarization",
    share_data   = True,                 # helps the model improve for everyone
)

@track_cost(model="gpt-4", smart_suggestions=True, config=config)
def my_function(prompt: str):
    ...

Config options

Field Options Effect
industry saas ecommerce healthcare finance legal education marketing engineering media other Adjusts quality sensitivity thresholds
company_size solo startup mid enterprise Influences cost vs reliability weighting
use_case customer_support rag content coding analytics translation summarization classification automation chatbot other Directly informs task-type detection
share_data True / False Whether to contribute anonymised usage to the shared ML model

share_prompts is always False regardless of what you pass. Prompt text never leaves your machine. See Privacy.


Guardrails

Security scanning

Enable guardrails to automatically scan every prompt for sensitive data before it reaches any AI provider.

@track_cost(model="gpt-4", enable_guardrails=True)
def process_user_input(text: str):
    ...

What gets detected:

Data type Action
API keys (OpenAI, Anthropic, AWS, etc.) 🔴 Call blocked
Private / cryptographic keys 🔴 Call blocked
Credit card numbers 🔴 Call blocked
Social Security Numbers 🔴 Call blocked
Email addresses 🟠 Warning shown
Phone numbers 🟠 Warning shown

When a critical issue is found the call never reaches your AI provider. A detailed report explains exactly what was detected and where.

Budget enforcement

@track_cost(
    model             = "gpt-4",
    enable_guardrails = True,
    daily_budget      = 10.00,
    monthly_budget    = 150.00,
)
def my_function(prompt: str):
    ...

When a call would push you over budget it is blocked before it's made:

❌ BLOCKED: Would exceed daily budget of $10.00
   Spent today:   $9.94
   Remaining:     $0.06
   Estimated cost of this call: $0.18

Runaway loop protection

If more than 100 calls are detected within any 5-minute window, further calls are blocked automatically and you're alerted. This catches bugs — infinite retry loops, agent run-aways — before they cause a surprise bill.


CLI Audit Tool

Scan any Python file to find AI cost optimisation opportunities without executing it.

llmoptimize audit mycode.py
╔════════════════════════════════════════════════════════════════════╗
║                    🤖 AI CODE AUDIT REPORT                       ║
╚════════════════════════════════════════════════════════════════════╝

📄 File: mycode.py

📊 ANALYSIS SUMMARY
────────────────────────────────────────────────────────────────────
Total API Calls:         7
Issues Found:            4
Models Used:             gpt-4, claude-3-opus-20240229

Est. Monthly Cost:       $342.00  (at 1,000 runs/month)
POTENTIAL SAVINGS:       $298.00  (87%)

🔍 DETAILED RECOMMENDATIONS

🔴 ISSUE #1: Line 42
   You're using:     claude-3-opus-20240229
   For:              Classifying support ticket urgency

   ✨ SWITCH TO:     claude-3-haiku-20240307
   Saves:            95%  |  Quality impact: MINIMAL  |  Confidence: 90%

CLI commands

# Audit a file (AI-powered analysis, no API key needed)
llmoptimize audit myfile.py

# Rule-based only — completely free, no network call
llmoptimize audit myfile.py --no-ai

# Force fresh analysis (skip cache)
llmoptimize audit myfile.py --force

# One-line summary
llmoptimize audit myfile.py --quiet

# Cache management
llmoptimize stats
llmoptimize clear-cache

Dashboard & Reports

In-code report

import llmoptimize

# ... your application code ...

llmoptimize.report()

Prints a full session breakdown:

════════════════════════════════════════════════════════════════════
📊 SESSION SUMMARY
════════════════════════════════════════════════════════════════════
Total Calls:      284
Total Cost:       $4.2180
Total Tokens:     421,800
Avg Cost/Call:    $0.014852
Duration:         0:18:42

MODEL BREAKDOWN:
────────────────────────────────────────────────────────────────────
gpt-4:
  Calls:   212     Cost: $3.8960     Tokens: 318,000
gpt-3.5-turbo:
  Calls:   72      Cost: $0.3220     Tokens: 103,800
════════════════════════════════════════════════════════════════════

Manual tracking

For providers not auto-patched, or for tracking custom inference:

llmoptimize.track(
    model             = "gpt-4",
    prompt_tokens     = 400,
    completion_tokens = 120,
    provider          = "openai",
)

Supported Providers & Models

LLMOptimize has pricing data for 50+ models across all major providers. A selection:

OpenAI gpt-4o · gpt-4o-mini · gpt-4-turbo · gpt-4 · gpt-3.5-turbo · o1 · o1-mini · text-embedding-3-small · text-embedding-3-large

Anthropic claude-3-5-sonnet-20241022 · claude-3-5-haiku-20241022 · claude-3-opus-20240229 · claude-3-sonnet-20240229 · claude-3-haiku-20240307

Groq llama-3.3-70b-versatile · llama-3.1-70b-versatile · llama-3.1-8b-instant · gemma2-9b-it · mixtral-8x7b-32768

Google gemini-1.5-pro · gemini-1.5-flash · gemini-1.0-pro

Mistral mistral-large-latest · mistral-small-latest · open-mixtral-8x7b

Cohere command-r-plus · command-r · command-light

Pricing data is kept up to date on the server — the SDK always uses the latest figures without requiring an update.


Privacy

LLMOptimize is built privacy-first. Here is exactly what goes where:

Data Stored locally Sent to server
Prompt text ❌ Never ❌ Never
Prompt category (e.g. "classification") ✅ Yes ✅ Yes
Token counts ✅ Yes ✅ Yes
Cost figures ✅ Yes ✅ Yes
Model names ✅ Yes ✅ Yes
Your user_id SHA-256 hashed First 16 chars of hash only
API keys Detected in prompts, blocked ❌ Never

The guarantee: share_prompts is always False. The code enforces this — it cannot be overridden. Your prompt text is classified locally on your machine and only the resulting label (e.g. "summarization") is ever transmitted.

To opt out of all data sharing entirely:

export AIOPTIMIZE_SHARE_DATA=false

Or in code:

config = AIOptimizeConfig(user_id="me", share_data=False)

Plans & Pricing

Free Pro Enterprise
Tracked calls / month 10,000 Unlimited Unlimited
Recommendation engine Heuristic Heuristic + ML Heuristic + ML + Custom
Code audit 5 files / month Unlimited Unlimited
Guardrails
Auto-optimize
Dedicated server instance
SSO / SAML
SLA 99.9% 99.99%
Support Community Email Dedicated Slack
Price Free $49 / month Contact us

Get started free → · View full pricing →


FAQ

Does it work with streaming responses? Yes. The SDK intercepts the completed response after streaming finishes and records usage from the final usage block. Your streaming code is unaffected.

Does it add latency to my API calls? No. Tracking and recommendation calls happen after your response is returned — they never sit on your critical path.

What if the recommendation server is unreachable? The SDK falls back to local heuristics instantly and your API call proceeds normally. There is no scenario where an LLMOptimize failure blocks your application.

Does auto_optimize change my prompt or my response? No. It only changes the model parameter on the API call. The prompt you wrote and the response you receive are identical — just generated by a cheaper model.

Can I use this with a self-hosted or fine-tuned model? Use llmoptimize.track() to manually record calls to any endpoint. Recommendations won't be available for unknown models, but cost tracking will work.

Is there a usage cap on the free tier? 10,000 tracked calls per month. The SDK continues to work above this limit — recommendations are paused until the next billing cycle.

Does it support LangChain / LlamaIndex? Yes. Both frameworks use the underlying OpenAI / Anthropic SDKs, which are patched automatically on import.

Can I audit files in CI?

# .github/workflows/cost-check.yml
- name: AI cost audit
  run: llmoptimize audit src/ --quiet

The CLI exits with code 1 if critical issues are found, making it easy to fail a pipeline.


Support


LLMOptimize — spend less, build more.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmoptimize-3.2.2.tar.gz (17.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llmoptimize-3.2.2-py3-none-any.whl (15.7 kB view details)

Uploaded Python 3

File details

Details for the file llmoptimize-3.2.2.tar.gz.

File metadata

  • Download URL: llmoptimize-3.2.2.tar.gz
  • Upload date:
  • Size: 17.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for llmoptimize-3.2.2.tar.gz
Algorithm Hash digest
SHA256 d4edbd3243baf5f72ad889113fef3b05d7c54ed23d13e5f2c5590f385aeb933a
MD5 0398a559e1fdbf66956d2419c83df529
BLAKE2b-256 bdfa5a576b246f4ccef3c79181318b03675bc44f840dad99e949f8d9a3b9e2dc

See more details on using hashes here.

File details

Details for the file llmoptimize-3.2.2-py3-none-any.whl.

File metadata

  • Download URL: llmoptimize-3.2.2-py3-none-any.whl
  • Upload date:
  • Size: 15.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for llmoptimize-3.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 cd43e695eebf57af3487359803b0c0bdd6f55cc8b6c2095e364db4f9bc21c414
MD5 f6bba90e808f7cc13294b0da9b7b8a13
BLAKE2b-256 7fc3adec791afcb1bc4b18f0daf88e387966b5972b3c3220623bac94edc8aa5a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page