Skip to main content

Lightweight prompt versioning & evaluation tracker for LLM engineers. Track every prompt change, measure impact, compare side-by-side.

Project description


PromptTrace

PromptTrace

Stop losing your best prompts.

Lightweight prompt versioning & evaluation tracker for LLM engineers.
One decorator. Automatic versioning. Local SQLite. Beautiful dashboard.


PyPI   Python   License   Deps



Quick Start  •   Features  •   Dashboard  •   API Reference  •   Configuration




The Problem

You iterate on prompts 50 times a day. You had a great system prompt last Tuesday that got 92% accuracy — but you lost it. You changed one word and everything broke, but you can't remember which word.

Your eval scores live in scattered notebooks and print() statements.

PromptTrace fixes this.pip install prompttrace → done.


📦 Installation

pip install prompttrace

Requirements: Python 3.9+ · Single dependency: rich


🚀 Quick Start

1 → Decorate your LLM calls

from prompttrace import trace

@trace(experiment="my-chatbot", model="gpt-4o")
def generate(prompt, temperature=0.7):
    response = openai.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}],
        temperature=temperature,
    )
    return response.choices[0].message.content

# Every call is now automatically tracked
generate("Explain quantum computing in one sentence.", temperature=0.3)
generate("Explain quantum computing in one sentence.", temperature=0.9)

2 → Launch the dashboard

from prompttrace import dashboard

dashboard()  # → http://127.0.0.1:8777

Or from the terminal:

prompttrace

That's it. Every prompt, output, latency, model, and generation parameter is logged and visualized.


✨ Features

Feature Description
🎯 @trace decorator Wrap any LLM call — auto-logs prompt, output, latency, params
📝 log_call() function Manual logging for when you can't use a decorator
📊 Auto eval Pass an eval_fn to score outputs automatically
🔀 Prompt versioning Every unique prompt gets a hash — see how changes affect results
⚖️ Side-by-side compare Diff two prompts word-by-word, see outputs and metrics
🖥️ Web dashboard Modern UI with animated charts, tables, filters — zero JS deps
🔒 Local-only Everything in SQLite. No cloud. No API keys. No telemetry
🎨 Rich terminal logs Colorful, emoji-powered console output via rich
🔄 Real-time updates Dashboard auto-refreshes every 2s — no manual reload
🗑️ Experiment management Delete experiments, filter dashboard by experiment
📤 CSV export One-click export of all traces for external analysis

🖥️ Dashboard

Launch with prompttrace or from prompttrace import dashboard; dashboard().

Three views:

View What it does
Dashboard Stats cards, latency chart, status donut, model usage — filterable by experiment
Traces Full table of all logged calls with search, filter, delete, and CSV export
Compare Select two prompts → word-level diff highlighting with outputs side-by-side

📖 Usage Guide

The @trace Decorator

from prompttrace import trace

@trace(
    experiment="summarizer",       # Group related traces
    model="claude-3-sonnet",       # Model identifier
    tags=["prod", "v2"],           # Optional tags
    description="Q3 summary bot",  # Optional experiment description
)
def summarize(prompt, temperature=0.5, max_tokens=500):
    # Your LLM call here
    return llm_response

What gets logged automatically:

Prompt text · Output · Latency · Generation parameters (temperature, top_p, max_tokens, etc.) · Input variables · Status (success / error) · Error messages · Approximate token counts


Returning Metadata

Return a dict to include token counts:

@trace(experiment="qa", model="gpt-4o")
def answer(prompt):
    resp = openai.chat.completions.create(...)
    return {
        "output": resp.choices[0].message.content,
        "token_count_input": resp.usage.prompt_tokens,
        "token_count_output": resp.usage.completion_tokens,
    }

Auto Evaluation

Pass an eval_fn to score every output automatically:

def my_eval(prompt, output):
    """Return a dict of metric_name: score."""
    return {
        "relevance": compute_relevance(prompt, output),
        "length_ok": 1.0 if 50 < len(output) < 500 else 0.0,
        "has_citation": 1.0 if "[source]" in output else 0.0,
    }

@trace(experiment="research-bot", model="gpt-4o", eval_fn=my_eval)
def research(prompt):
    return call_llm(prompt)

Metrics appear in the terminal and the dashboard.


Manual Logging with log_call()

For cases where a decorator doesn't fit:

from prompttrace import log_call
import time

start = time.perf_counter()
output = my_llm_pipeline(prompt)
elapsed = (time.perf_counter() - start) * 1000

log_call(
    prompt="Translate to French: Hello world",
    output="Bonjour le monde",
    experiment="translation",
    model="gpt-4o-mini",
    generation_params={"temperature": 0.2},
    latency_ms=elapsed,
    token_count_input=8,
    token_count_output=5,
    tags=["translation", "french"],
    eval_metrics={"bleu": 0.95, "fluency": 0.88},
)

CLI

# Default (localhost:8777)
prompttrace

# Custom port
prompttrace --port 9000

# Accessible from network
prompttrace --host 0.0.0.0 --port 8777

📋 API Reference

@trace(...)

Parameter Type Default Description
experiment str "default" Experiment name for grouping
model str "unknown" Model identifier
tags list[str] None Optional tags
eval_fn callable None fn(prompt, output) → dict[str, float]
description str "" Experiment description

log_call(...)

Parameter Type Default Description
prompt str required The prompt template
output str required The LLM output
experiment str "default" Experiment name
model str "unknown" Model identifier
generation_params dict None e.g. {"temperature": 0.7}
input_variables dict None Template variables
latency_ms float 0 Response time in ms
token_count_input int 0 Input token count
token_count_output int 0 Output token count
status str "success" "success" or "error"
error_message str "" Error details
tags list[str] None Optional tags
eval_metrics dict None {"metric": score}

dashboard(host, port)

Launches the web UI. Blocks until Ctrl+C.


⚙️ Configuration

Database Location

By default, traces are stored in .prompttrace/traces.db in the current directory.

# Override via environment variable
export PROMPTTRACE_DB=/path/to/my/traces.db
# Override programmatically
from prompttrace import set_db_path
set_db_path("/path/to/my/traces.db")

📁 Project Structure

your-project/
├── pyproject.toml
├── README.md
├── example.py
└── prompttrace/
    ├── __init__.py          # Public API exports
    ├── core.py              # @trace decorator, log_call, dashboard launcher
    ├── db.py                # SQLite database layer
    ├── server.py            # Built-in HTTP server + JSON API
    ├── cli.py               # CLI entry point
    ├── dashboard.html       # Single-file web dashboard (zero JS deps)
    └── logo.png             # App logo

📄 License

MIT — use it however you want.





PromptTrace
Stop losing your best prompts.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

prompttrace-0.1.0.tar.gz (214.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

prompttrace-0.1.0-py3-none-any.whl (210.3 kB view details)

Uploaded Python 3

File details

Details for the file prompttrace-0.1.0.tar.gz.

File metadata

  • Download URL: prompttrace-0.1.0.tar.gz
  • Upload date:
  • Size: 214.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for prompttrace-0.1.0.tar.gz
Algorithm Hash digest
SHA256 53282af59ea90f58751aa516f93dd694d7fca92a32e19120f62655cdcdc96f7b
MD5 f32f5f8eecfb2673eb469b73acb035ab
BLAKE2b-256 a82865f1c37db6bb67a2323041a00d17c24b6e46d442510560f8160913743eaa

See more details on using hashes here.

File details

Details for the file prompttrace-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: prompttrace-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 210.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for prompttrace-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a0c710fdee9bfe88c5726b0a7d724c85c915f177049257d8b4d6e706811c92f2
MD5 6fcb8868131ec30e29780c4c19dbe022
BLAKE2b-256 69ffc76c32b4ae4d6443963f2971b58d0755d239eb13234485b356d8ed1803f5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page