Real-time gradient pathology detection for PyTorch

These details have not been verified by PyPI

Project links

Project description

torch-surgeon

Real-time gradient pathology detection for PyTorch — in 2 lines.

A loss curve is a lagging indicator. By the time it shows a problem, vanishing or exploding gradients have been compounding for hundreds of steps. torch-surgeon attaches diagnostic hooks to your model and surfaces per-layer pathologies in real time, before they compound into an unrecoverable run.

Install

pip install torch-surgeon

Usage

from torch_surgeon import Surgeon

surgeon = Surgeon(model, rules="default")
surgeon.attach()

# ... your existing training loop, unchanged ...
for epoch in range(epochs):
    loss = criterion(model(x), y)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

report = surgeon.report()   # per-layer stats dict
surgeon.detach()            # clean removal of all hooks

What it detects

Pathology	Detection method
Vanishing gradients	Per-layer norm ratio drops below threshold vs EMA baseline
Exploding gradients	Per-layer norm ratio exceeds threshold vs EMA baseline
Stagnant layers	Norm near-zero for N consecutive steps — layer stopped learning

Custom rules

surgeon = Surgeon(model, rules={
    "vanishing_threshold": 0.01,   # default
    "exploding_threshold": 100.0,  # default
    "stagnant_steps": 50,          # default
    "log_every": 10,               # print summary every N steps
    "plot": True,                  # live matplotlib plot
    "verbose": True,
})

How it works

torch-surgeon uses PyTorch's register_full_backward_hook API to intercept gradients at every leaf layer during the backward pass. Statistics (mean, std, norm) are computed inside the hook and the raw gradient tensor is discarded immediately — keeping overhead under 1% on typical training loops.

Pathology detection uses an exponential moving average (EMA) baseline per layer rather than fixed thresholds — so it generalises across architectures without manual tuning.

Performance

Sub-1% training overhead on standard loops. Validated against 100-step timing benchmarks on Linear/ReLU networks. Stats computed in-hook; no tensors stored between steps.

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Mar 20, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

torch_surgeon-0.1.0.tar.gz (10.1 kB view details)

Uploaded Mar 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

torch_surgeon-0.1.0-py3-none-any.whl (8.7 kB view details)

Uploaded Mar 20, 2026 Python 3

File details

Details for the file torch_surgeon-0.1.0.tar.gz.

File metadata

Download URL: torch_surgeon-0.1.0.tar.gz
Upload date: Mar 20, 2026
Size: 10.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for torch_surgeon-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`a0b7a342e4e6a5f7186ca04f445475135f71f3953bfec2c51a105e0086588d61`
MD5	`24c102cd0543a7f03521053101d6ad6b`
BLAKE2b-256	`0e8f0c28ab0b77fea35d8714ffb0f801015354a67e7cd4096b8bd3c68c5eb6ff`

See more details on using hashes here.

File details

Details for the file torch_surgeon-0.1.0-py3-none-any.whl.

File metadata

Download URL: torch_surgeon-0.1.0-py3-none-any.whl
Upload date: Mar 20, 2026
Size: 8.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for torch_surgeon-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2ad8d5f76d75daae7be4169fc759bea339551bcc689ba613618ed194aa1b429d`
MD5	`6c00abf56c3938753d03b44cc2fb81a7`
BLAKE2b-256	`04c25d25bbcf73b98fbd32fa0c38c83c902543d50a471927821704421606f8f1`

See more details on using hashes here.

torch-surgeon 0.1.0

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

torch-surgeon

Install

Usage

What it detects

Custom rules

How it works

Performance

License

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes