Reduce LLM costs by 90% - AI recommendations with NO API keys needed!

These details have not been verified by PyPI

Project links

Project description

LLMOptimize

Cut your AI API costs by 40–97% — automatically. One import. Zero prompt changes. No infrastructure to run.

pip install llmoptimize

import llmoptimize   # done — every AI call is now tracked and optimised

What It Does
Quick Start
Installation
Auto-Tracking
The @track_cost Decorator
Configuration
Guardrails
CLI Audit Tool
Dashboard & Reports
Supported Providers & Models
Privacy
Plans & Pricing
FAQ

What It Does

LLMOptimize monitors every AI API call your application makes and tells you when a cheaper model would do the same job just as well.

Your Code  →  LLMOptimize SDK  →  Your AI Provider (OpenAI / Anthropic / ...)
                    │
                    ▼
            Recommendation Engine
            (hosted — nothing to run)
                    │
                    ▼
         "Use gpt-3.5-turbo instead.
          95% cheaper. Minimal quality
          impact. 90% confident."

The recommendation engine has three layers that run in order:

Instant heuristics — task-type detection using your prompt shape and keywords
ML model — trained on aggregated acceptance signals from all users (gets smarter over time)
Pattern database — crowd-sourced patterns from millions of real API calls

Everything runs on our servers. You install the SDK, we handle the rest.

Quick Start

Step 1 — Install

pip install llmoptimize

Step 2 — Import before your AI library

import llmoptimize          # one line — patches OpenAI, Anthropic, Groq automatically

import openai
client = openai.OpenAI()

# Your existing code — completely unchanged
response = client.chat.completions.create(
    model    = "gpt-4",
    messages = [{"role": "user", "content": "Classify this email as spam or not."}]
)

Step 3 — See your savings

llmoptimize.report()

╔════════════════════════════════════════════════════════════════════╗
║                      SMART RECOMMENDATION                        ║
╚════════════════════════════════════════════════════════════════════╝

🟢 Confidence: 90%

📊 You used:   gpt-4          →  $0.012400
✨ Switch to:  gpt-3.5-turbo  →  $0.000620

💰 YOU SAVE:   $0.011780  (95%)
📈 Quality impact: MINIMAL

💬 Why: Classification task — cheaper models maintain 95%+ accuracy

That's it. No server to run, no dashboard to set up, no config files.

Installation

Requirements

Python 3.9 or higher
At least one AI SDK: openai, anthropic, groq, google-generativeai, mistralai, or cohere

Install

pip install llmoptimize

Optional environment variables

Variable	Default	What it does
`AIOPTIMIZE_SERVER_URL`	Managed cloud	Point to a dedicated instance (enterprise plans)
`AIOPTIMIZE_TIMEOUT`	`3` seconds	Max wait for a recommendation before proceeding
`AIOPTIMIZE_SHARE_DATA`	`true`	Opt out of anonymised metadata sharing

Auto-Tracking (Zero Code Changes)

import llmoptimize silently wraps every AI library you have installed. Your existing code, your existing response objects, your existing error handling — all untouched.

Supported libraries:

Provider	Library	Chat	Embeddings	Async
OpenAI	`openai`	✅	✅	✅
Anthropic	`anthropic`	✅	—	✅
Groq	`groq`	✅	—	✅
Google	`google-generativeai`	✅	—	✅
Mistral	`mistralai`	✅	—	✅
Cohere	`cohere`	✅	✅	—

Guarantees:

Your API response is returned exactly as the provider sends it — nothing is modified
If LLMOptimize encounters any internal error, it fails silently and your call goes through normally
No added latency on the critical path — tracking and recommendations happen asynchronously

The `@track_cost` Decorator

For more control, wrap specific functions directly.

Basic tracking

from llmoptimize import track_cost

@track_cost(model="gpt-4")
def classify_ticket(text: str):
    return client.chat.completions.create(
        model    = "gpt-4",
        messages = [{"role": "user", "content": text}]
    )

Show recommendations before the call

@track_cost(model="gpt-4", smart_suggestions=True)
def analyze_document(text: str):
    ...

Auto-switch when confident

When auto_optimize=True, the SDK automatically uses the cheaper model when confidence is 90% or higher — no human needed in the loop:

@track_cost(
    model             = "gpt-4",
    smart_suggestions = True,
    auto_optimize     = True,
)
def batch_classify(items: list):
    ...

# Console output:
# ✨ Auto-optimized: gpt-4 → gpt-3.5-turbo
#    Savings: $0.0114 (92%)  |  Confidence: 94%

Full decorator options

@track_cost(
    model              = "gpt-4",        # the model your code calls
    smart_suggestions  = False,          # show cheaper alternative before the call
    auto_optimize      = False,          # auto-switch at >= 90% confidence
    config             = None,           # AIOptimizeConfig for better recommendations
    enable_guardrails  = False,          # PII scanning + budget enforcement
    daily_budget       = None,           # float — block calls if daily spend exceeds this
    monthly_budget     = None,           # float — block calls if monthly spend exceeds this
)

Works identically on async def functions with no extra setup.

Configuration

AIOptimizeConfig gives the recommendation engine context about your use case, which improves suggestion accuracy — especially for industry-specific quality tradeoffs.

from llmoptimize import track_cost, AIOptimizeConfig

config = AIOptimizeConfig(
    user_id      = "your-company-id",    # anonymised before it leaves your machine
    industry     = "healthcare",         # tunes quality vs cost tradeoffs
    company_size = "startup",
    use_case     = "summarization",
    share_data   = True,                 # helps the model improve for everyone
)

@track_cost(model="gpt-4", smart_suggestions=True, config=config)
def my_function(prompt: str):
    ...

Config options

Field	Options	Effect
`industry`	`saas` `ecommerce` `healthcare` `finance` `legal` `education` `marketing` `engineering` `media` `other`	Adjusts quality sensitivity thresholds
`company_size`	`solo` `startup` `mid` `enterprise`	Influences cost vs reliability weighting
`use_case`	`customer_support` `rag` `content` `coding` `analytics` `translation` `summarization` `classification` `automation` `chatbot` `other`	Directly informs task-type detection
`share_data`	`True` / `False`	Whether to contribute anonymised usage to the shared ML model

share_prompts is always False regardless of what you pass. Prompt text never leaves your machine. See Privacy.

Guardrails

Security scanning

Enable guardrails to automatically scan every prompt for sensitive data before it reaches any AI provider.

@track_cost(model="gpt-4", enable_guardrails=True)
def process_user_input(text: str):
    ...

What gets detected:

Data type	Action
API keys (OpenAI, Anthropic, AWS, etc.)	🔴 Call blocked
Private / cryptographic keys	🔴 Call blocked
Credit card numbers	🔴 Call blocked
Social Security Numbers	🔴 Call blocked
Email addresses	🟠 Warning shown
Phone numbers	🟠 Warning shown

When a critical issue is found the call never reaches your AI provider. A detailed report explains exactly what was detected and where.

Budget enforcement

@track_cost(
    model             = "gpt-4",
    enable_guardrails = True,
    daily_budget      = 10.00,
    monthly_budget    = 150.00,
)
def my_function(prompt: str):
    ...

When a call would push you over budget it is blocked before it's made:

❌ BLOCKED: Would exceed daily budget of $10.00
   Spent today:   $9.94
   Remaining:     $0.06
   Estimated cost of this call: $0.18

Runaway loop protection

If more than 100 calls are detected within any 5-minute window, further calls are blocked automatically and you're alerted. This catches bugs — infinite retry loops, agent run-aways — before they cause a surprise bill.

CLI Audit Tool

Scan any Python file to find AI cost optimisation opportunities without executing it.

llmoptimize audit mycode.py

╔════════════════════════════════════════════════════════════════════╗
║                    🤖 AI CODE AUDIT REPORT                       ║
╚════════════════════════════════════════════════════════════════════╝

📄 File: mycode.py

📊 ANALYSIS SUMMARY
────────────────────────────────────────────────────────────────────
Total API Calls:         7
Issues Found:            4
Models Used:             gpt-4, claude-3-opus-20240229

Est. Monthly Cost:       $342.00  (at 1,000 runs/month)
POTENTIAL SAVINGS:       $298.00  (87%)

🔍 DETAILED RECOMMENDATIONS

🔴 ISSUE #1: Line 42
   You're using:     claude-3-opus-20240229
   For:              Classifying support ticket urgency

   ✨ SWITCH TO:     claude-3-haiku-20240307
   Saves:            95%  |  Quality impact: MINIMAL  |  Confidence: 90%

CLI commands

# Audit a file (AI-powered analysis, no API key needed)
llmoptimize audit myfile.py

# Rule-based only — completely free, no network call
llmoptimize audit myfile.py --no-ai

# Force fresh analysis (skip cache)
llmoptimize audit myfile.py --force

# One-line summary
llmoptimize audit myfile.py --quiet

# Cache management
llmoptimize stats
llmoptimize clear-cache

Dashboard & Reports

In-code report

import llmoptimize

# ... your application code ...

llmoptimize.report()

Prints a full session breakdown:

════════════════════════════════════════════════════════════════════
📊 SESSION SUMMARY
════════════════════════════════════════════════════════════════════
Total Calls:      284
Total Cost:       $4.2180
Total Tokens:     421,800
Avg Cost/Call:    $0.014852
Duration:         0:18:42

MODEL BREAKDOWN:
────────────────────────────────────────────────────────────────────
gpt-4:
  Calls:   212     Cost: $3.8960     Tokens: 318,000
gpt-3.5-turbo:
  Calls:   72      Cost: $0.3220     Tokens: 103,800
════════════════════════════════════════════════════════════════════

Manual tracking

For providers not auto-patched, or for tracking custom inference:

llmoptimize.track(
    model             = "gpt-4",
    prompt_tokens     = 400,
    completion_tokens = 120,
    provider          = "openai",
)

Supported Providers & Models

LLMOptimize has pricing data for 50+ models across all major providers. A selection:

OpenAI gpt-4o · gpt-4o-mini · gpt-4-turbo · gpt-4 · gpt-3.5-turbo · o1 · o1-mini · text-embedding-3-small · text-embedding-3-large

Anthropic claude-3-5-sonnet-20241022 · claude-3-5-haiku-20241022 · claude-3-opus-20240229 · claude-3-sonnet-20240229 · claude-3-haiku-20240307

Groq llama-3.3-70b-versatile · llama-3.1-70b-versatile · llama-3.1-8b-instant · gemma2-9b-it · mixtral-8x7b-32768

Google gemini-1.5-pro · gemini-1.5-flash · gemini-1.0-pro

Mistral mistral-large-latest · mistral-small-latest · open-mixtral-8x7b

Cohere command-r-plus · command-r · command-light

Pricing data is kept up to date on the server — the SDK always uses the latest figures without requiring an update.

Privacy

LLMOptimize is built privacy-first. Here is exactly what goes where:

Data	Stored locally	Sent to server
Prompt text	❌ Never	❌ Never
Prompt category (e.g. `"classification"`)	✅ Yes	✅ Yes
Token counts	✅ Yes	✅ Yes
Cost figures	✅ Yes	✅ Yes
Model names	✅ Yes	✅ Yes
Your `user_id`	SHA-256 hashed	First 16 chars of hash only
API keys	Detected in prompts, blocked	❌ Never

The guarantee: share_prompts is always False. The code enforces this — it cannot be overridden. Your prompt text is classified locally on your machine and only the resulting label (e.g. "summarization") is ever transmitted.

To opt out of all data sharing entirely:

export AIOPTIMIZE_SHARE_DATA=false

Or in code:

config = AIOptimizeConfig(user_id="me", share_data=False)

Plans & Pricing

	Free	Pro	Enterprise
Tracked calls / month	10,000	Unlimited	Unlimited
Recommendation engine	Heuristic	Heuristic + ML	Heuristic + ML + Custom
Code audit	5 files / month	Unlimited	Unlimited
Guardrails	✅	✅	✅
Auto-optimize	✅	✅	✅
Dedicated server instance	❌	❌	✅
SSO / SAML	❌	❌	✅
SLA	—	99.9%	99.99%
Support	Community	Email	Dedicated Slack
Price	Free	$49 / month	Contact us

Get started free → · View full pricing →

FAQ

Does it work with streaming responses? Yes. The SDK intercepts the completed response after streaming finishes and records usage from the final usage block. Your streaming code is unaffected.

Does it add latency to my API calls? No. Tracking and recommendation calls happen after your response is returned — they never sit on your critical path.

What if the recommendation server is unreachable? The SDK falls back to local heuristics instantly and your API call proceeds normally. There is no scenario where an LLMOptimize failure blocks your application.

Does auto_optimize change my prompt or my response? No. It only changes the model parameter on the API call. The prompt you wrote and the response you receive are identical — just generated by a cheaper model.

Can I use this with a self-hosted or fine-tuned model? Use llmoptimize.track() to manually record calls to any endpoint. Recommendations won't be available for unknown models, but cost tracking will work.

Is there a usage cap on the free tier? 10,000 tracked calls per month. The SDK continues to work above this limit — recommendations are paused until the next billing cycle.

Does it support LangChain / LlamaIndex? Yes. Both frameworks use the underlying OpenAI / Anthropic SDKs, which are patched automatically on import.

Can I audit files in CI?

# .github/workflows/cost-check.yml
- name: AI cost audit
  run: llmoptimize audit src/ --quiet

The CLI exits with code 1 if critical issues are found, making it easy to fail a pipeline.

Support

LLMOptimize — spend less, build more.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

3.3.0

Mar 4, 2026

3.2.8

Mar 4, 2026

3.2.7

Mar 4, 2026

3.2.6

Mar 4, 2026

3.2.5

Mar 4, 2026

3.2.4

Mar 4, 2026

3.2.3

Mar 4, 2026

This version

3.2.2

Mar 3, 2026

3.2.1

Mar 2, 2026

3.2.0

Mar 2, 2026

3.1.0

Mar 2, 2026

2.1.1

Feb 28, 2026

2.1.0

Feb 28, 2026

1.0.6

Feb 27, 2026

1.0.5

Feb 24, 2026

1.0.4

Feb 24, 2026

1.0.3

Feb 24, 2026

1.0.2

Feb 24, 2026

1.0.1

Feb 24, 2026

1.0.0

Feb 24, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmoptimize-3.2.2.tar.gz (17.2 kB view details)

Uploaded Mar 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llmoptimize-3.2.2-py3-none-any.whl (15.7 kB view details)

Uploaded Mar 3, 2026 Python 3

File details

Details for the file llmoptimize-3.2.2.tar.gz.

File metadata

Download URL: llmoptimize-3.2.2.tar.gz
Upload date: Mar 3, 2026
Size: 17.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for llmoptimize-3.2.2.tar.gz
Algorithm	Hash digest
SHA256	`d4edbd3243baf5f72ad889113fef3b05d7c54ed23d13e5f2c5590f385aeb933a`
MD5	`0398a559e1fdbf66956d2419c83df529`
BLAKE2b-256	`bdfa5a576b246f4ccef3c79181318b03675bc44f840dad99e949f8d9a3b9e2dc`

See more details on using hashes here.

File details

Details for the file llmoptimize-3.2.2-py3-none-any.whl.

File metadata

Download URL: llmoptimize-3.2.2-py3-none-any.whl
Upload date: Mar 3, 2026
Size: 15.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for llmoptimize-3.2.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`cd43e695eebf57af3487359803b0c0bdd6f55cc8b6c2095e364db4f9bc21c414`
MD5	`f6bba90e808f7cc13294b0da9b7b8a13`
BLAKE2b-256	`7fc3adec791afcb1bc4b18f0daf88e387966b5972b3c3220623bac94edc8aa5a`

See more details on using hashes here.

llmoptimize 3.2.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

LLMOptimize

Table of Contents

What It Does

Quick Start

Step 1 — Install

Step 2 — Import before your AI library

Step 3 — See your savings

Installation

Requirements

Install

Optional environment variables

Auto-Tracking (Zero Code Changes)

The @track_cost Decorator

Basic tracking

Show recommendations before the call

Auto-switch when confident

Full decorator options

Configuration

Config options

Guardrails

Security scanning

Budget enforcement

Runaway loop protection

CLI Audit Tool

CLI commands

Dashboard & Reports

In-code report

Manual tracking

Supported Providers & Models

Privacy

Plans & Pricing

FAQ

Support

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

The `@track_cost` Decorator