Skip to main content

Performance regression monitor for ML inference projects

Project description

perf-guard

perf-guard is a zero-intrusion command-line tool that turns the result directories of your existing benchmark / eval scripts into a versioned performance history. Record snapshots, compare against one or many baselines, and plot long-term trends.

Highlights

  • Zero intrusion. Does not run your benchmarks; only parses the output.
  • Config-driven. All metric extraction is declared in perf_guard.yaml.
  • Multi-baseline compare. Compare the current run against baseline, last_week, main, or any tag in one command.
  • Trend charts. Render the whole .perf_history/ as an interactive HTML chart, or print ASCII sparklines in the terminal.
  • CI-ready. Exits with code 1 on regression so pipelines fail automatically.

Install

pip install perf-guard

Quick start

  1. Drop a perf_guard.yaml at the project root:

    results_dir: ./results
    
    metrics:
      - name: total_eval_time
        file: summary.txt
        pattern: "Eval time:\\s+([\\d.]+)s"
        threshold_pct: 5
        direction: lower_is_better
    
      - name: pusht_success_rate
        file: pusht/eval.log
        pattern: "success_rate=([\\d.]+)%"
        threshold_pct: 2
        direction: higher_is_better
    
  2. Record your reference run and tag it:

    perf-guard record results/20260421_054042 --tag baseline
    
  3. After your next run, compare and detect regressions:

    perf-guard compare                               # latest vs baseline
    perf-guard compare --base baseline --base last_week
    perf-guard compare --base main --current latest
    

Commands

Command Purpose
perf-guard record <dir> [--tag …] Extract metrics from a result dir and store a snapshot.
perf-guard compare [--base …] [--current …] Compare current against one or more baselines.
perf-guard list List all recorded snapshots.
perf-guard report <ref> Show full metrics for one snapshot.
perf-guard trend [--ref …] [--metric …] [-o path] [--ascii] Plot metric trends across history.
perf-guard install-hook Install a git post-commit hook.

Ref resolution

Anywhere a <ref> is accepted, the following are valid:

  • latest — the most recently recorded snapshot
  • A user tag (created with record --tag)
  • A dirname (e.g. 20260421_054042)
  • yesterday, last_week, last_month — closest snapshot to that time
  • <N>d_ago or <N>_days_ago — closest snapshot to N days ago
  • latest~<N> — N records before the latest
  • HEAD, HEAD~N, branch names, tags, short SHAs — any git revision whose commit has been recorded (see Git integration below)

Git integration

When perf-guard record runs inside a git working tree, each snapshot also stores the current commit:

"git": {
  "sha": "a1b2c3d",
  "sha_full": "a1b2c3d4e5f6...",
  "branch": "main",
  "dirty": false,
  "subject": "jepa: fix fp16 cache alignment",
  "author": "huxie",
  "committed_at": "2026-04-21T05:12:00+08:00"
}

This unlocks git-style ref resolution for every command that takes a <ref>:

perf-guard compare --base HEAD~1         # vs the previous commit's run
perf-guard compare --base HEAD~5         # vs 5 commits ago
perf-guard compare --base main           # vs latest main-branch run
perf-guard compare --base v1.0           # vs a tagged release
perf-guard compare --base a1b2c3d        # vs a specific short SHA
perf-guard report HEAD                   # detail view of the HEAD run

perf-guard list and perf-guard report also display the commit SHA, branch and dirty flag so you can trace each snapshot back to its code state.

Multi-baseline compare

perf-guard compare --base baseline --base last_week --current latest

Prints one table per baseline. Exits with 1 if any baseline shows a regression.

Trend chart

# HTML chart (Chart.js via CDN) — default
perf-guard trend -o trend.html

# Restrict to a few snapshots / metrics
perf-guard trend --ref baseline --ref dtype-fix --ref compile-opt \
                 --metric total_eval_time --metric pusht_success_rate

# ASCII sparklines in the terminal
perf-guard trend --ascii

Exit codes

Code Meaning
0 Success, no regressions
1 Config error, ref not found, or regression detected
2 Result directory does not exist

Development

git clone https://github.com/huxie/perf-guard
cd perf-guard
pip install -e .
perf-guard --version

CI (.github/workflows/ci.yml) runs the CLI smoke-test matrix on Python 3.9–3.12 and builds/validates the distribution on every push and PR.

Releasing

Releases are fully automated by .github/workflows/publish.yml. Any tag of the form v* triggers:

  1. build sdist + wheel, check with twine check --strict;
  2. publish to PyPI via Trusted Publishing (no long-lived tokens);
  3. create a GitHub Release with the matching CHANGELOG.md section and the built artifacts attached.

To cut a release:

scripts/release.sh 0.3.1

That script bumps the version in pyproject.toml and perf_guard/__init__.py, inserts a CHANGELOG stub, asks for confirmation, then commits, tags and pushes. The publish workflow takes it from there.

One-time setup: PyPI Trusted Publishing

On pypi.org add a pending publisher for the perf-guard project:

Field Value
Owner your GitHub username / org
Repository perf-guard
Workflow name publish.yml
Environment pypi

Also create a GitHub environment named pypi in the repository settings. No secrets or tokens are needed — authentication happens via short-lived OIDC tokens minted per run.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

perf_guard-0.3.1.tar.gz (19.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

perf_guard-0.3.1-py3-none-any.whl (20.7 kB view details)

Uploaded Python 3

File details

Details for the file perf_guard-0.3.1.tar.gz.

File metadata

  • Download URL: perf_guard-0.3.1.tar.gz
  • Upload date:
  • Size: 19.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for perf_guard-0.3.1.tar.gz
Algorithm Hash digest
SHA256 6080bf2ff132f6952733bbc234ea915f85350c25263d8ae47fc34c54de2bc6ca
MD5 677540629f856d28c1e1f125e2049aa8
BLAKE2b-256 6cdeeda27e67c3d51bff14200b38cfbe288bda4eb56907f13bd70a287496f6ad

See more details on using hashes here.

Provenance

The following attestation bundles were made for perf_guard-0.3.1.tar.gz:

Publisher: publish.yml on crystalhuanxie-del/perf-guard

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file perf_guard-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: perf_guard-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 20.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for perf_guard-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 4bf5fd7048601bdef15a385ae4df2f908fce6c70479ddc7e6c01b064eb3815f0
MD5 2ddfcce6aee7a525085db7b4627b9aa2
BLAKE2b-256 ed74dd5c675c7d61aad608f778a69a39f71f8cf43db6c1a24efd04f3678e4b98

See more details on using hashes here.

Provenance

The following attestation bundles were made for perf_guard-0.3.1-py3-none-any.whl:

Publisher: publish.yml on crystalhuanxie-del/perf-guard

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page