Production radar for LLM apps — capture a baseline, detect when latency, cost, or behavior drifts.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

promptmetrics

Production radar for LLM apps. Capture a baseline of live traffic, get alerted when latency, cost, or behavior drifts.

promptmetrics records every LLM call to a local SQLite database, computes a statistical fingerprint of "what good looked like at deploy time," and tells you when the recent window has drifted. Single file, pip-installable, no account, no SaaS bill.

Install

pip install promptmetrics

Requires Python 3.10+.

5-minute quickstart

1. Decorate the call you care about

from openai import OpenAI
from promptmetrics import track

client = OpenAI()

@track("summarize_v1", model="gpt-4o-mini")
def summarize(text: str):
    return client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": f"Summarize: {text}"}],
    )

That's it. Every call is appended to ~/.promptmetrics/promptmetrics.db with input, output, latency, and token counts. The decorator never raises if storage fails — your app keeps running.

2. Capture a baseline once you have history

promptmetrics baseline summarize_v1 --window 168

Summarises the last 7 days of traces (mean / p50 / p95 / p99 latency, mean tokens) and stores them as the active baseline.

3. Check for drift

promptmetrics check summarize_v1 --window 1

Compares the most recent hour against the baseline and prints a report. Exits non-zero on DRIFTED so it composes with cron, CI, and shell pipelines.

Try it without an LLM

git clone https://github.com/pallaprolus/promptmetrics && cd promptmetrics
pip install -e .
python demo.py
promptmetrics baseline demo --db ./demo.db --window 24 --min-samples 100
promptmetrics check    demo --db ./demo.db --window 1

The demo.py script seeds 300 healthy traces and 60 deliberately drifted ones so you can see a real DRIFTED report on your first run.

What it detects

Detector	Method	Default threshold
Latency	Kolmogorov–Smirnov test on the latency distribution plus a percentile-ratio check on p95	`WARNING` at +15% p95, `DRIFTED` at +30% p95
Cost	Mean total-tokens ratio vs baseline	`WARNING` at +15%, `DRIFTED` at +30%

The KS test only fires when the recent window is slower than the baseline — a faster system is good news, not an alert.

Programmatic API

from promptmetrics import PromptMetrics

with PromptMetrics() as r:
    baseline = r.capture_baseline("summarize_v1", window_hours=168)
    report = r.check_drift("summarize_v1", window_hours=1)
    print(report.severity)
    for result in report.results:
        print(result.drift_type, result.severity, result.detail)

Custom token / output extractors

If your call returns something promptmetrics can't introspect, pass extractors:

@track(
    "rag_query",
    extract_output=lambda r: r.answer,
    extract_tokens=lambda r: (r.usage.input_tokens, r.usage.output_tokens),
)
def rag_query(question: str): ...

OpenAI- and Anthropic-style usage objects are detected automatically.

Sensitive data: prompts and outputs are stored verbatim

By default, @track writes the full input and output of every call to the local SQLite database. If your prompts contain PII, secrets, customer data, or anything you wouldn't want sitting in ~/.promptmetrics/ indefinitely, scrub it with the redact_input / redact_output hooks:

import re
from promptmetrics import track

EMAIL = re.compile(r"\b[\w.+-]+@[\w-]+\.[\w.-]+\b")
SSN = re.compile(r"\b\d{3}-\d{2}-\d{4}\b")

def scrub(text: str) -> str:
    text = EMAIL.sub("[EMAIL]", text)
    text = SSN.sub("[SSN]", text)
    return text

@track("support_reply", redact_input=scrub, redact_output=scrub)
def reply(customer_message: str): ...

The redactor runs before the trace is written, so the raw values never touch disk. If your redactor raises, the trace is recorded with an empty string and the error is logged — pass raise_on_error=True to fail loudly instead.

The DB is a plain SQLite file at ~/.promptmetrics/promptmetrics.db (override with PromptMetrics(db_path=...) or --db). Treat it like any other file with sensitive data: back it up, encrypt the volume, or delete it on a schedule.

Strict mode for CI

@track("nightly_eval", raise_on_error=True)
def eval_run(): ...

By default the decorator never raises — observability shouldn't break production. In CI or eval pipelines where silent metric corruption is worse than a crash, set raise_on_error=True so extractor, redactor, and storage failures all surface as exceptions.

What's deliberately out of scope (for v0.1)

Slack / Discord / PagerDuty alerting
Semantic / quality drift (LLM-as-judge, embedding similarity)
Hosted dashboard
Multi-baseline versioning, A/B comparison
Cloud sync

These are planned for v0.2+. The schema already reserves loop_id and step_index columns for the next feature on the roadmap: agent-loop drift detection for multi-step agents.

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

spallaprolu

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.1

May 3, 2026

0.1.0

May 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

promptmetrics-0.1.1.tar.gz (19.7 kB view details)

Uploaded May 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

promptmetrics-0.1.1-py3-none-any.whl (16.7 kB view details)

Uploaded May 3, 2026 Python 3

File details

Details for the file promptmetrics-0.1.1.tar.gz.

File metadata

Download URL: promptmetrics-0.1.1.tar.gz
Upload date: May 3, 2026
Size: 19.7 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for promptmetrics-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`f6963b3929e890edc1abe06ccf456561b54745ad81621d96b150397b6163aeba`
MD5	`1d8d1a7c31e31fc4d0ce48080f415cbd`
BLAKE2b-256	`a5fee86c3815e86ea6f61fa1dd50e81ba47f935299b9da5e87af217267e8e91f`

See more details on using hashes here.

Provenance

The following attestation bundles were made for promptmetrics-0.1.1.tar.gz:

Publisher: publish.yml on pallaprolus/promptmetrics

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: promptmetrics-0.1.1.tar.gz
- Subject digest: f6963b3929e890edc1abe06ccf456561b54745ad81621d96b150397b6163aeba
- Sigstore transparency entry: 1430371952
- Sigstore integration time: May 3, 2026
Source repository:
- Permalink: pallaprolus/promptmetrics@fdf15f49c316f3804ad3ff475d065855408c1883
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/pallaprolus
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@fdf15f49c316f3804ad3ff475d065855408c1883
- Trigger Event: push

File details

Details for the file promptmetrics-0.1.1-py3-none-any.whl.

File metadata

Download URL: promptmetrics-0.1.1-py3-none-any.whl
Upload date: May 3, 2026
Size: 16.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for promptmetrics-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0b4c582ccd9a0ead65c6592b8a71578e540007ce67400fb179c5e25aced0a8e4`
MD5	`bfcc83af153ba97d4033d2dbb4d960a3`
BLAKE2b-256	`aa1dd2e61177a630c3788f63ae72f29cbfa297ca78b5f573c3a9045456cea229`

See more details on using hashes here.

Provenance

The following attestation bundles were made for promptmetrics-0.1.1-py3-none-any.whl:

Publisher: publish.yml on pallaprolus/promptmetrics

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: promptmetrics-0.1.1-py3-none-any.whl
- Subject digest: 0b4c582ccd9a0ead65c6592b8a71578e540007ce67400fb179c5e25aced0a8e4
- Sigstore transparency entry: 1430372408
- Sigstore integration time: May 3, 2026
Source repository:
- Permalink: pallaprolus/promptmetrics@fdf15f49c316f3804ad3ff475d065855408c1883
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/pallaprolus
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@fdf15f49c316f3804ad3ff475d065855408c1883
- Trigger Event: push

promptmetrics 0.1.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

promptmetrics

Install

5-minute quickstart

1. Decorate the call you care about

2. Capture a baseline once you have history

3. Check for drift

Try it without an LLM

What it detects

Programmatic API

Custom token / output extractors

Sensitive data: prompts and outputs are stored verbatim

Strict mode for CI

What's deliberately out of scope (for v0.1)

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance