Lightweight prompt versioning & evaluation tracker for LLM engineers. Track every prompt change, measure impact, compare side-by-side.

These details have not been verified by PyPI

Project links

Project description

`PromptTrace`

Stop losing your best prompts.

Lightweight prompt versioning & evaluation tracker for LLM engineers.
One decorator. Automatic versioning. Local SQLite. Beautiful dashboard.

Quick Start • Features • Dashboard • API Reference • Configuration

The Problem

You iterate on prompts 50 times a day. You had a great system prompt last Tuesday that got 92% accuracy — but you lost it. You changed one word and everything broke, but you can't remember which word.

Your eval scores live in scattered notebooks and print() statements.

PromptTrace fixes this. → pip install prompttrace → done.

📦 Installation

pip install prompttrace

Requirements: Python 3.9+ · Single dependency: rich

🚀 Quick Start

1 → Decorate your LLM calls

from prompttrace import trace

@trace(experiment="my-chatbot", model="gpt-4o")
def generate(prompt, temperature=0.7):
    response = openai.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}],
        temperature=temperature,
    )
    return response.choices[0].message.content

# Every call is now automatically tracked
generate("Explain quantum computing in one sentence.", temperature=0.3)
generate("Explain quantum computing in one sentence.", temperature=0.9)

2 → Launch the dashboard

from prompttrace import dashboard

dashboard()  # → http://127.0.0.1:8777

Or from the terminal:

prompttrace

That's it. Every prompt, output, latency, model, and generation parameter is logged and visualized.

✨ Features

	Feature	Description
🎯	`@trace` decorator	Wrap any LLM call — auto-logs prompt, output, latency, params
📝	`log_call()` function	Manual logging for when you can't use a decorator
📊	Auto eval	Pass an `eval_fn` to score outputs automatically
🔀	Prompt versioning	Every unique prompt gets a hash — see how changes affect results
⚖️	Side-by-side compare	Diff two prompts word-by-word, see outputs and metrics
🖥️	Web dashboard	Modern UI with animated charts, tables, filters — zero JS deps
🔒	Local-only	Everything in SQLite. No cloud. No API keys. No telemetry
🎨	Rich terminal logs	Colorful, emoji-powered console output via `rich`
🔄	Real-time updates	Dashboard auto-refreshes every 2s — no manual reload
🗑️	Experiment management	Delete experiments, filter dashboard by experiment
📤	CSV export	One-click export of all traces for external analysis

🖥️ Dashboard

Launch with prompttrace or from prompttrace import dashboard; dashboard().

Three views:

View	What it does
Dashboard	Stats cards, latency chart, status donut, model usage — filterable by experiment
Traces	Full table of all logged calls with search, filter, delete, and CSV export
Compare	Select two prompts → word-level diff highlighting with outputs side-by-side

📖 Usage Guide

The `@trace` Decorator

from prompttrace import trace

@trace(
    experiment="summarizer",       # Group related traces
    model="claude-3-sonnet",       # Model identifier
    tags=["prod", "v2"],           # Optional tags
    description="Q3 summary bot",  # Optional experiment description
)
def summarize(prompt, temperature=0.5, max_tokens=500):
    # Your LLM call here
    return llm_response

What gets logged automatically:

Prompt text · Output · Latency · Generation parameters (temperature, top_p, max_tokens, etc.) · Input variables · Status (success / error) · Error messages · Approximate token counts

Returning Metadata

Return a dict to include token counts:

@trace(experiment="qa", model="gpt-4o")
def answer(prompt):
    resp = openai.chat.completions.create(...)
    return {
        "output": resp.choices[0].message.content,
        "token_count_input": resp.usage.prompt_tokens,
        "token_count_output": resp.usage.completion_tokens,
    }

Auto Evaluation

Pass an eval_fn to score every output automatically:

def my_eval(prompt, output):
    """Return a dict of metric_name: score."""
    return {
        "relevance": compute_relevance(prompt, output),
        "length_ok": 1.0 if 50 < len(output) < 500 else 0.0,
        "has_citation": 1.0 if "[source]" in output else 0.0,
    }

@trace(experiment="research-bot", model="gpt-4o", eval_fn=my_eval)
def research(prompt):
    return call_llm(prompt)

Metrics appear in the terminal and the dashboard.

Manual Logging with `log_call()`

For cases where a decorator doesn't fit:

from prompttrace import log_call
import time

start = time.perf_counter()
output = my_llm_pipeline(prompt)
elapsed = (time.perf_counter() - start) * 1000

log_call(
    prompt="Translate to French: Hello world",
    output="Bonjour le monde",
    experiment="translation",
    model="gpt-4o-mini",
    generation_params={"temperature": 0.2},
    latency_ms=elapsed,
    token_count_input=8,
    token_count_output=5,
    tags=["translation", "french"],
    eval_metrics={"bleu": 0.95, "fluency": 0.88},
)

CLI

# Default (localhost:8777)
prompttrace

# Custom port
prompttrace --port 9000

# Accessible from network
prompttrace --host 0.0.0.0 --port 8777

📋 API Reference

`@trace(...)`

Parameter	Type	Default	Description
`experiment`	`str`	`"default"`	Experiment name for grouping
`model`	`str`	`"unknown"`	Model identifier
`tags`	`list[str]`	`None`	Optional tags
`eval_fn`	`callable`	`None`	`fn(prompt, output) → dict[str, float]`
`description`	`str`	`""`	Experiment description

`log_call(...)`

Parameter	Type	Default	Description
`prompt`	`str`	required	The prompt template
`output`	`str`	required	The LLM output
`experiment`	`str`	`"default"`	Experiment name
`model`	`str`	`"unknown"`	Model identifier
`generation_params`	`dict`	`None`	e.g. `{"temperature": 0.7}`
`input_variables`	`dict`	`None`	Template variables
`latency_ms`	`float`	`0`	Response time in ms
`token_count_input`	`int`	`0`	Input token count
`token_count_output`	`int`	`0`	Output token count
`status`	`str`	`"success"`	`"success"` or `"error"`
`error_message`	`str`	`""`	Error details
`tags`	`list[str]`	`None`	Optional tags
`eval_metrics`	`dict`	`None`	`{"metric": score}`

`dashboard(host, port)`

Launches the web UI. Blocks until Ctrl+C.

⚙️ Configuration

Database Location

By default, traces are stored in .prompttrace/traces.db in the current directory.

# Override via environment variable
export PROMPTTRACE_DB=/path/to/my/traces.db

# Override programmatically
from prompttrace import set_db_path
set_db_path("/path/to/my/traces.db")

📁 Project Structure

your-project/
├── pyproject.toml
├── README.md
├── example.py
└── prompttrace/
    ├── __init__.py          # Public API exports
    ├── core.py              # @trace decorator, log_call, dashboard launcher
    ├── db.py                # SQLite database layer
    ├── server.py            # Built-in HTTP server + JSON API
    ├── cli.py               # CLI entry point
    ├── dashboard.html       # Single-file web dashboard (zero JS deps)
    └── logo.png             # App logo

📄 License

MIT — use it however you want.

PromptTrace
_{Stop losing your best prompts.}

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Mar 26, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

prompttrace-0.1.0.tar.gz (214.2 kB view details)

Uploaded Mar 26, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

prompttrace-0.1.0-py3-none-any.whl (210.3 kB view details)

Uploaded Mar 26, 2026 Python 3

File details

Details for the file prompttrace-0.1.0.tar.gz.

File metadata

Download URL: prompttrace-0.1.0.tar.gz
Upload date: Mar 26, 2026
Size: 214.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for prompttrace-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`53282af59ea90f58751aa516f93dd694d7fca92a32e19120f62655cdcdc96f7b`
MD5	`f32f5f8eecfb2673eb469b73acb035ab`
BLAKE2b-256	`a82865f1c37db6bb67a2323041a00d17c24b6e46d442510560f8160913743eaa`

See more details on using hashes here.

File details

Details for the file prompttrace-0.1.0-py3-none-any.whl.

File metadata

Download URL: prompttrace-0.1.0-py3-none-any.whl
Upload date: Mar 26, 2026
Size: 210.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for prompttrace-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a0c710fdee9bfe88c5726b0a7d724c85c915f177049257d8b4d6e706811c92f2`
MD5	`6fcb8868131ec30e29780c4c19dbe022`
BLAKE2b-256	`69ffc76c32b4ae4d6443963f2971b58d0755d239eb13234485b356d8ed1803f5`

See more details on using hashes here.

prompttrace 0.1.0

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

PromptTrace

The Problem

📦 Installation

🚀 Quick Start

1 → Decorate your LLM calls

2 → Launch the dashboard

✨ Features

🖥️ Dashboard

📖 Usage Guide

The @trace Decorator

Returning Metadata

Auto Evaluation

Manual Logging with log_call()

CLI

📋 API Reference

@trace(...)

log_call(...)

dashboard(host, port)

⚙️ Configuration

Database Location

📁 Project Structure

📄 License

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`PromptTrace`

The `@trace` Decorator

Manual Logging with `log_call()`

`@trace(...)`

`log_call(...)`

`dashboard(host, port)`