Skip to main content

Production radar for LLM apps — capture a baseline, detect when latency, cost, or behavior drifts.

Project description

promptmetrics

Production radar for LLM apps. Capture a baseline of live traffic, get alerted when latency, cost, or behavior drifts.

promptmetrics records every LLM call to a local SQLite database, computes a statistical fingerprint of "what good looked like at deploy time," and tells you when the recent window has drifted. Single file, pip-installable, no account, no SaaS bill.

Install

pip install promptmetrics

Requires Python 3.10+.

5-minute quickstart

1. Decorate the call you care about

from openai import OpenAI
from promptmetrics import track

client = OpenAI()

@track("summarize_v1", model="gpt-4o-mini")
def summarize(text: str):
    return client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": f"Summarize: {text}"}],
    )

That's it. Every call is appended to ~/.promptmetrics/promptmetrics.db with input, output, latency, and token counts. The decorator never raises if storage fails — your app keeps running.

2. Capture a baseline once you have history

promptmetrics baseline summarize_v1 --window 168

Summarises the last 7 days of traces (mean / p50 / p95 / p99 latency, mean tokens) and stores them as the active baseline.

3. Check for drift

promptmetrics check summarize_v1 --window 1

Compares the most recent hour against the baseline and prints a report. Exits non-zero on DRIFTED so it composes with cron, CI, and shell pipelines.

Try it without an LLM

git clone https://github.com/pallaprolus/promptmetrics && cd promptmetrics
pip install -e .
python demo.py
promptmetrics baseline demo --db ./demo.db --window 24 --min-samples 100
promptmetrics check    demo --db ./demo.db --window 1

The demo.py script seeds 300 healthy traces and 60 deliberately drifted ones so you can see a real DRIFTED report on your first run.

What it detects

Detector Method Default threshold
Latency Kolmogorov–Smirnov test on the latency distribution plus a percentile-ratio check on p95 WARNING at +15% p95, DRIFTED at +30% p95
Cost Mean total-tokens ratio vs baseline WARNING at +15%, DRIFTED at +30%

The KS test only fires when the recent window is slower than the baseline — a faster system is good news, not an alert.

Programmatic API

from promptmetrics import PromptMetrics

with PromptMetrics() as r:
    baseline = r.capture_baseline("summarize_v1", window_hours=168)
    report = r.check_drift("summarize_v1", window_hours=1)
    print(report.severity)
    for result in report.results:
        print(result.drift_type, result.severity, result.detail)

Custom token / output extractors

If your call returns something promptmetrics can't introspect, pass extractors:

@track(
    "rag_query",
    extract_output=lambda r: r.answer,
    extract_tokens=lambda r: (r.usage.input_tokens, r.usage.output_tokens),
)
def rag_query(question: str): ...

OpenAI- and Anthropic-style usage objects are detected automatically.

What's deliberately out of scope (for v0.1)

  • Slack / Discord / PagerDuty alerting
  • Semantic / quality drift (LLM-as-judge, embedding similarity)
  • Hosted dashboard
  • Multi-baseline versioning, A/B comparison
  • Cloud sync

These are planned for v0.2+. The schema already reserves loop_id and step_index columns for the next feature on the roadmap: agent-loop drift detection for multi-step agents.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

promptmetrics-0.1.0.tar.gz (16.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

promptmetrics-0.1.0-py3-none-any.whl (14.3 kB view details)

Uploaded Python 3

File details

Details for the file promptmetrics-0.1.0.tar.gz.

File metadata

  • Download URL: promptmetrics-0.1.0.tar.gz
  • Upload date:
  • Size: 16.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for promptmetrics-0.1.0.tar.gz
Algorithm Hash digest
SHA256 76d03e041e8168a11d0322af586fffb3aee29658f50b9654b169513bb2bca649
MD5 242afa69fec079a83ccfbe4eb47b50be
BLAKE2b-256 fddd26f9b1a8a7e49cfd4a38cf6a7be92afc4d8d5f4e2e89c55f4ccef9631783

See more details on using hashes here.

Provenance

The following attestation bundles were made for promptmetrics-0.1.0.tar.gz:

Publisher: publish.yml on pallaprolus/promptmetrics

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file promptmetrics-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: promptmetrics-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 14.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for promptmetrics-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8950ab372f2b2abc15811d722d09abe1dff59aa8634fe6066788e37d46c97880
MD5 ef9d7d685d94c480ef72433a868f895b
BLAKE2b-256 139ff0bb509983dcc8cf7bcb7fcb329d2d7d00b7e6d80ceedffbe152b429eb48

See more details on using hashes here.

Provenance

The following attestation bundles were made for promptmetrics-0.1.0-py3-none-any.whl:

Publisher: publish.yml on pallaprolus/promptmetrics

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page