Lightweight prompt versioning & evaluation tracker for LLM engineers. Track every prompt change, measure impact, compare side-by-side.
Project description
PromptTrace
Stop losing your best prompts.
Lightweight prompt versioning & evaluation tracker for LLM engineers.
One decorator. Automatic versioning. Local SQLite. Beautiful dashboard.
Quick Start • Features • Dashboard • API Reference • Configuration
The Problem
You iterate on prompts 50 times a day. You had a great system prompt last Tuesday that got 92% accuracy — but you lost it. You changed one word and everything broke, but you can't remember which word.
Your eval scores live in scattered notebooks and print() statements.
PromptTrace fixes this. → pip install prompttrace → done.
📦 Installation
pip install prompttrace
Requirements: Python 3.9+ · Single dependency:
rich
🚀 Quick Start
1 → Decorate your LLM calls
from prompttrace import trace
@trace(experiment="my-chatbot", model="gpt-4o")
def generate(prompt, temperature=0.7):
response = openai.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}],
temperature=temperature,
)
return response.choices[0].message.content
# Every call is now automatically tracked
generate("Explain quantum computing in one sentence.", temperature=0.3)
generate("Explain quantum computing in one sentence.", temperature=0.9)
2 → Launch the dashboard
from prompttrace import dashboard
dashboard() # → http://127.0.0.1:8777
Or from the terminal:
prompttrace
That's it. Every prompt, output, latency, model, and generation parameter is logged and visualized.
✨ Features
| Feature | Description | |
|---|---|---|
| 🎯 | @trace decorator |
Wrap any LLM call — auto-logs prompt, output, latency, params |
| 📝 | log_call() function |
Manual logging for when you can't use a decorator |
| 📊 | Auto eval | Pass an eval_fn to score outputs automatically |
| 🔀 | Prompt versioning | Every unique prompt gets a hash — see how changes affect results |
| ⚖️ | Side-by-side compare | Diff two prompts word-by-word, see outputs and metrics |
| 🖥️ | Web dashboard | Modern UI with animated charts, tables, filters — zero JS deps |
| 🔒 | Local-only | Everything in SQLite. No cloud. No API keys. No telemetry |
| 🎨 | Rich terminal logs | Colorful, emoji-powered console output via rich |
| 🔄 | Real-time updates | Dashboard auto-refreshes every 2s — no manual reload |
| 🗑️ | Experiment management | Delete experiments, filter dashboard by experiment |
| 📤 | CSV export | One-click export of all traces for external analysis |
🖥️ Dashboard
Launch with prompttrace or from prompttrace import dashboard; dashboard().
Three views:
| View | What it does |
|---|---|
| Dashboard | Stats cards, latency chart, status donut, model usage — filterable by experiment |
| Traces | Full table of all logged calls with search, filter, delete, and CSV export |
| Compare | Select two prompts → word-level diff highlighting with outputs side-by-side |
📖 Usage Guide
The @trace Decorator
from prompttrace import trace
@trace(
experiment="summarizer", # Group related traces
model="claude-3-sonnet", # Model identifier
tags=["prod", "v2"], # Optional tags
description="Q3 summary bot", # Optional experiment description
)
def summarize(prompt, temperature=0.5, max_tokens=500):
# Your LLM call here
return llm_response
What gets logged automatically:
Prompt text · Output · Latency · Generation parameters (
temperature,top_p,max_tokens, etc.) · Input variables · Status (success/error) · Error messages · Approximate token counts
Returning Metadata
Return a dict to include token counts:
@trace(experiment="qa", model="gpt-4o")
def answer(prompt):
resp = openai.chat.completions.create(...)
return {
"output": resp.choices[0].message.content,
"token_count_input": resp.usage.prompt_tokens,
"token_count_output": resp.usage.completion_tokens,
}
Auto Evaluation
Pass an eval_fn to score every output automatically:
def my_eval(prompt, output):
"""Return a dict of metric_name: score."""
return {
"relevance": compute_relevance(prompt, output),
"length_ok": 1.0 if 50 < len(output) < 500 else 0.0,
"has_citation": 1.0 if "[source]" in output else 0.0,
}
@trace(experiment="research-bot", model="gpt-4o", eval_fn=my_eval)
def research(prompt):
return call_llm(prompt)
Metrics appear in the terminal and the dashboard.
Manual Logging with log_call()
For cases where a decorator doesn't fit:
from prompttrace import log_call
import time
start = time.perf_counter()
output = my_llm_pipeline(prompt)
elapsed = (time.perf_counter() - start) * 1000
log_call(
prompt="Translate to French: Hello world",
output="Bonjour le monde",
experiment="translation",
model="gpt-4o-mini",
generation_params={"temperature": 0.2},
latency_ms=elapsed,
token_count_input=8,
token_count_output=5,
tags=["translation", "french"],
eval_metrics={"bleu": 0.95, "fluency": 0.88},
)
CLI
# Default (localhost:8777)
prompttrace
# Custom port
prompttrace --port 9000
# Accessible from network
prompttrace --host 0.0.0.0 --port 8777
📋 API Reference
@trace(...)
| Parameter | Type | Default | Description |
|---|---|---|---|
experiment |
str |
"default" |
Experiment name for grouping |
model |
str |
"unknown" |
Model identifier |
tags |
list[str] |
None |
Optional tags |
eval_fn |
callable |
None |
fn(prompt, output) → dict[str, float] |
description |
str |
"" |
Experiment description |
log_call(...)
| Parameter | Type | Default | Description |
|---|---|---|---|
prompt |
str |
required | The prompt template |
output |
str |
required | The LLM output |
experiment |
str |
"default" |
Experiment name |
model |
str |
"unknown" |
Model identifier |
generation_params |
dict |
None |
e.g. {"temperature": 0.7} |
input_variables |
dict |
None |
Template variables |
latency_ms |
float |
0 |
Response time in ms |
token_count_input |
int |
0 |
Input token count |
token_count_output |
int |
0 |
Output token count |
status |
str |
"success" |
"success" or "error" |
error_message |
str |
"" |
Error details |
tags |
list[str] |
None |
Optional tags |
eval_metrics |
dict |
None |
{"metric": score} |
dashboard(host, port)
Launches the web UI. Blocks until Ctrl+C.
⚙️ Configuration
Database Location
By default, traces are stored in .prompttrace/traces.db in the current directory.
# Override via environment variable
export PROMPTTRACE_DB=/path/to/my/traces.db
# Override programmatically
from prompttrace import set_db_path
set_db_path("/path/to/my/traces.db")
📁 Project Structure
your-project/
├── pyproject.toml
├── README.md
├── example.py
└── prompttrace/
├── __init__.py # Public API exports
├── core.py # @trace decorator, log_call, dashboard launcher
├── db.py # SQLite database layer
├── server.py # Built-in HTTP server + JSON API
├── cli.py # CLI entry point
├── dashboard.html # Single-file web dashboard (zero JS deps)
└── logo.png # App logo
📄 License
MIT — use it however you want.
PromptTrace
Stop losing your best prompts.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file prompttrace-0.1.0.tar.gz.
File metadata
- Download URL: prompttrace-0.1.0.tar.gz
- Upload date:
- Size: 214.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
53282af59ea90f58751aa516f93dd694d7fca92a32e19120f62655cdcdc96f7b
|
|
| MD5 |
f32f5f8eecfb2673eb469b73acb035ab
|
|
| BLAKE2b-256 |
a82865f1c37db6bb67a2323041a00d17c24b6e46d442510560f8160913743eaa
|
File details
Details for the file prompttrace-0.1.0-py3-none-any.whl.
File metadata
- Download URL: prompttrace-0.1.0-py3-none-any.whl
- Upload date:
- Size: 210.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a0c710fdee9bfe88c5726b0a7d724c85c915f177049257d8b4d6e706811c92f2
|
|
| MD5 |
6fcb8868131ec30e29780c4c19dbe022
|
|
| BLAKE2b-256 |
69ffc76c32b4ae4d6443963f2971b58d0755d239eb13234485b356d8ed1803f5
|