LLM development debug layer - every API call recorded, nothing lost. Reasoning tokens, prompt diff, session timeline, LangChain/LlamaIndex integration.

These details have not been verified by PyPI

Project links

Homepage

Project description

llm-devproxy

LLM development debug layer — every API call recorded, nothing lost.

A local debug layer that solves the common pain points of LLM app development.

Auto-records every API call — nothing is ever lost
Cache eliminates redundant costs — same requests return from DB
Prevents cost explosions — mock responses when limit is reached
Rewind like Git — "go back to step 3 and try again" in seconds
🧠 Reasoning token visibility — o1/o3, Claude thinking, Gemini 2.5 thinking tracked and visualized
🔔 Smart alerts — cost warnings, high-spend single requests, reasoning ratio alerts
📥 Export — CSV/JSON export via CLI and Web UI
🌊 Streaming support — stream=True works transparently, recorded after completion

Install

pip install llm-devproxy                  # minimal
pip install "llm-devproxy[openai]"        # with OpenAI
pip install "llm-devproxy[anthropic]"     # with Anthropic
pip install "llm-devproxy[gemini]"        # with Gemini
pip install "llm-devproxy[proxy]"         # with proxy server
pip install "llm-devproxy[all]"           # everything

Usage — Library

OpenAI

import openai
from llm_devproxy import DevProxy

proxy = DevProxy(daily_limit_usd=1.0)
proxy.start_session("my_agent")

# Just wrap your existing client
client = proxy.wrap_openai(openai.OpenAI(api_key="sk-..."))

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}]
)
print(response.choices[0].message.content)

Anthropic

import anthropic
from llm_devproxy import DevProxy

proxy = DevProxy(daily_limit_usd=1.0)
proxy.start_session("my_agent")

client = proxy.wrap_anthropic(anthropic.Anthropic(api_key="sk-ant-..."))

response = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello"}]
)
print(response.text)

Gemini

import google.generativeai as genai
from llm_devproxy import DevProxy

genai.configure(api_key="AI...")
proxy = DevProxy(daily_limit_usd=1.0)
proxy.start_session("my_agent")

model = proxy.wrap_gemini(genai.GenerativeModel("gemini-1.5-flash"))
response = model.generate_content("Hello")
print(response.text)

Usage — Proxy Server

llm-devproxy start --port 8080 --limit 1.0

Just change base_url in your app — nothing else:

# OpenAI
client = openai.OpenAI(
    api_key="sk-...",
    base_url="http://localhost:8080/openai/v1",
)

# Anthropic
client = anthropic.Anthropic(
    api_key="sk-ant-...",
    base_url="http://localhost:8080/anthropic/v1",
)

CLI

# List recent sessions
llm-devproxy history

# Show all steps in a session
llm-devproxy show my_agent

# Search through recorded prompts
llm-devproxy search "keyword"

# Rewind to step 3 (original history preserved)
llm-devproxy rewind my_agent --step 3

# Rewind and start a new branch
llm-devproxy rewind my_agent --step 3 --branch new_idea

# Show cost stats
llm-devproxy stats

# Export to CSV/JSON (v0.3.0)
llm-devproxy export -f csv -o requests.csv
llm-devproxy export -f json --provider openai --model o1

# Launch web dashboard (v0.3.0)
llm-devproxy web --port 8765

Streaming (v0.3.0)

stream=True just works — chunks are passed through transparently, then recorded after the stream completes.

# OpenAI streaming
stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
    stream=True,
)
for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

# Anthropic streaming
stream = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello"}],
    stream=True,
)
for event in stream:
    # events pass through as-is
    pass

# All streamed responses are automatically recorded with full token counts

Reasoning Token Tracking (v0.3.0)

Reasoning tokens from o1/o3, Claude extended thinking, and Gemini 2.5 thinking are automatically tracked, costed, and visualized.

# Terminal output when reasoning tokens are used:
# 🧠 Reasoning tokens: 2,400 (83% of output) | Output: 500 | Cost: $0.034500

# Access reasoning stats
stats = proxy.engine.reasoning_stats()
# {'total_reasoning': 2400, 'total_output': 500, 'reasoning_pct': 82.8, ...}

The Web UI shows reasoning tokens with visual bars on every page — history, detail, costs, and session comparison.

Time Travel Use Cases

Resume an agent from the middle

proxy = DevProxy()

# Rewind yesterday's run to step 8
proxy.rewind("my_agent", step=8)

# Tweak the prompt and re-run → recorded as a new branch
client = proxy.wrap_openai(openai.OpenAI())
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Improved prompt"}]
)

Find something from days ago

llm-devproxy search "approach A"
# → session=my_agent, step=5

llm-devproxy rewind my_agent --step 5 --branch "revisit"

Zero API cost in CI/CD

# Same requests return from SQLite cache
# No API charges in GitHub Actions
proxy = DevProxy(cache_enabled=True)

Config

proxy = DevProxy(
    db_path=".llm_devproxy.db",  # SQLite path
    daily_limit_usd=1.0,          # daily cost limit
    session_limit_usd=None,       # per-session limit (optional)
    on_exceed="mock",             # "mock" or "block"
    cache_enabled=True,
    compress_after_days=30,

    # Alert settings (v0.3.0)
    alert_daily_threshold=0.8,    # warn at 80% of daily limit
    alert_session_threshold=0.8,  # warn at 80% of session limit
    alert_reasoning_ratio=0.7,    # warn if reasoning > 70% of output
    alert_single_cost_usd=0.10,   # warn if single request > $0.10
)

All data stays local

Everything is stored in .llm_devproxy.db (SQLite) on your machine. Nothing is sent to any external server.

Roadmap

Phase 1: Cache, cost guard, auto-record everything
Phase 2: Proxy server (OpenAI/Anthropic/Gemini compatible), CLI
Phase 3: Rewind, branches, tags, memos
Phase 4: Semantic cache
Phase 5: Web UI (history browser, cost dashboard, session comparison)
Phase 6: Reasoning token tracking, alerts, CSV/JSON export, streaming
Phase 7: Team sharing (cloud edition)

日本語版 README

llm-devproxy（日本語）

LLMアプリ開発中の「あるある」をすべて解決するローカルデバッグレイヤーです。

API呼び出しを全量自動記録 — 保存し忘れはありえない
キャッシュで無駄なAPI代ゼロ — 同じリクエストはDBから返す
コスト爆発を防ぐ — 上限設定でmockレスポンスを返す
Gitのように巻き戻せる — 「あのステップ3からやり直したい」が即できる
🧠 推論トークン可視化 — o1/o3, Claude thinking, Gemini 2.5 thinkingを追跡・可視化
🔔 スマートアラート — コスト警告、高額リクエスト、推論トークン比率アラート
📥 エクスポート — CSV/JSONエクスポート（CLI・Web UI両対応）
🌊 Streaming対応 — stream=True が透過的に動作、完了後に自動記録

詳しい使い方は英語版をご覧ください（内容は同じです）。

License

MIT

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.4.4

Mar 26, 2026

0.4.3

Mar 26, 2026

0.4.2

Mar 26, 2026

0.4.1

Mar 26, 2026

0.4.0

Mar 26, 2026

0.3.2

Mar 25, 2026

0.3.1

Mar 25, 2026

0.3.0

Mar 25, 2026

0.2.0

Mar 14, 2026

0.1.0

Mar 13, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_devproxy-0.4.4.tar.gz (85.5 kB view details)

Uploaded Mar 26, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llm_devproxy-0.4.4-py3-none-any.whl (79.7 kB view details)

Uploaded Mar 26, 2026 Python 3

File details

Details for the file llm_devproxy-0.4.4.tar.gz.

File metadata

Download URL: llm_devproxy-0.4.4.tar.gz
Upload date: Mar 26, 2026
Size: 85.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.4

File hashes

Hashes for llm_devproxy-0.4.4.tar.gz
Algorithm	Hash digest
SHA256	`18d94592e46f1ba3ac9d3bb535f94c2b8938f299a0739a61e871a7b97a37ae04`
MD5	`0286964e98992d09ed18211929f646e2`
BLAKE2b-256	`73b6c7523c458b9ea484d4964862afae836f03ea5e17987a95ec78433a945f4a`

See more details on using hashes here.

File details

Details for the file llm_devproxy-0.4.4-py3-none-any.whl.

File metadata

Download URL: llm_devproxy-0.4.4-py3-none-any.whl
Upload date: Mar 26, 2026
Size: 79.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.4

File hashes

Hashes for llm_devproxy-0.4.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0250a337639e741cc16ce5975a785c6b60f0a49417f489af3461e184b9068604`
MD5	`e3cd45e31ba3e1ba32fce030a43570c2`
BLAKE2b-256	`e5289cc5bec5bd08dc315c38cee837638ed2de63a9221e61e2fee32bb4921aa3`

See more details on using hashes here.

llm-devproxy 0.4.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

llm-devproxy

Install

Usage — Library

OpenAI

Anthropic

Gemini

Usage — Proxy Server

CLI

Streaming (v0.3.0)

Reasoning Token Tracking (v0.3.0)

Time Travel Use Cases

Resume an agent from the middle

Find something from days ago

Zero API cost in CI/CD

Config

All data stays local

Roadmap

llm-devproxy（日本語）

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes