Lightweight training health monitor. Detect loss spikes, gradient explosions, and NaN — 2 lines of code, no server, no signup.

These details have not been verified by PyPI

Project links

Project description

trainpulse

Lightweight training health monitor. Detect loss spikes, gradient explosions, NaN/Inf, and plateaus — 2 lines of code, no server, no signup.

trainpulse report

Why trainpulse?

Feature	W&B / Neptune	TensorBoard	trainpulse
Setup	Account + API key	TF dependency	`pip install trainpulse`
NaN/Inf detection	Manual	No	Automatic
Loss spike alerts	Manual	No	Automatic
Gradient monitoring	Manual	Manual	Automatic
Plateau detection	No	No	Automatic
Zero dependencies	No	No	Yes
Works offline	No	Yes	Yes

Install

pip install trainpulse

With PyTorch integration:

pip install trainpulse[torch]

With CLI:

pip install trainpulse[cli]

Quick Start

Minimal — 2 lines

from trainpulse import Monitor

monitor = Monitor()
for step in range(num_steps):
    loss = train_step()
    monitor.log("loss", step, loss)

report = monitor.report()
print(f"Health: {report.health_score:.0%}")

Full training loop

from trainpulse import Monitor, MonitorConfig

config = MonitorConfig(
    loss_spike_threshold=5.0,    # Alert if loss > 5x rolling average
    grad_norm_threshold=100.0,   # Alert if gradient norm > 100
    plateau_patience=200,        # Alert after 200 steps without improvement
)

monitor = Monitor(config)

for step in range(num_steps):
    monitor.step_start()

    loss = train_step()
    grad_norm = get_grad_norm()
    lr = scheduler.get_last_lr()[0]

    monitor.log("loss", step, loss)
    monitor.log("grad_norm", step, grad_norm)
    monitor.log("learning_rate", step, lr)
    monitor.step_end(step)

report = monitor.report()

Callback API

from trainpulse import TrainingCallback

cb = TrainingCallback()
for step in range(num_steps):
    cb.on_step_begin(step)
    loss = train_step()
    cb.on_step_end(step, loss=loss, grad_norm=grad_norm, lr=lr)

report = cb.report()

Real-time alerts

def my_alert_handler(alert):
    print(f"⚠ {alert}")
    # Or send to Slack, Discord, email...

config = MonitorConfig(alert_callbacks=[my_alert_handler])
monitor = Monitor(config)

trainpulse alerts

Detectors

Detector	What it catches	Default threshold
NaN/Inf	NaN or Inf in any metric	Always on
Loss spike	Sudden loss increase vs rolling average	5x
Gradient explosion	Gradient norm too large	100.0
Gradient vanishing	Gradient norm too small	1e-7
LR anomaly	Learning rate jumps	10x change
Plateau	No loss improvement	100 steps
Step time	Unusually slow steps	3x average

CLI

Analyze training logs (JSONL format):

trainpulse analyze train.jsonl
trainpulse analyze train.jsonl --json-out report.json
trainpulse show report.json

Expected JSONL format:

{"step": 0, "loss": 2.5, "grad_norm": 1.2, "learning_rate": 0.001}
{"step": 1, "loss": 2.3, "grad_norm": 1.1, "learning_rate": 0.001}

Health Score

The health score (0.0–1.0) is computed from alert severity:

Critical alerts (NaN, gradient explosion): −0.15 each
Warning alerts (spikes, plateaus): −0.05 each
Info alerts: −0.01 each

A score above 0.80 generally indicates healthy training.

API Reference

`Monitor(config=None)`

Main class. Call .log(name, step, value) to record metrics.

`MonitorConfig`

Parameter	Default	Description
`loss_spike_threshold`	5.0	Multiplier over rolling average
`loss_spike_window`	50	Rolling window size
`grad_norm_threshold`	100.0	Max acceptable gradient norm
`grad_vanish_threshold`	1e-7	Min acceptable gradient norm
`check_nan`	True	Enable NaN/Inf detection
`lr_change_threshold`	10.0	Max LR change ratio per step
`plateau_patience`	100	Steps without improvement
`plateau_min_delta`	1e-5	Minimum improvement delta
`step_time_spike_threshold`	3.0	Step time spike multiplier
`alert_callbacks`	[]	Functions called on each alert

`TrainingReport`

Property	Type	Description
`.health_score`	float	0.0 (terrible) to 1.0 (perfect)
`.is_healthy`	bool	True if no critical alerts
`.n_warnings`	int	Number of warning alerts
`.n_critical`	int	Number of critical alerts
`.alerts`	list[Alert]	All triggered alerts
`.metrics_summary`	dict	Per-metric min/max/mean/last

Project	What it does
tokonomics	Token counting & cost management for LLM APIs
datacrux	Training data quality — dedup, PII, contamination
castwright	Synthetic instruction data generation
datamix	Dataset mixing & curriculum optimization
toksight	Tokenizer analysis & comparison
ckpt	Checkpoint inspection, diffing & merging
quantbench	Quantization quality analysis
infermark	Inference benchmarking
modeldiff	Behavioral regression testing
vibesafe	AI-generated code safety scanner
injectionguard	Prompt injection detection

License

Apache-2.0

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.4.0

Apr 11, 2026

0.3.0

Apr 10, 2026

0.2.0

Apr 10, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

trainpulse-0.4.0.tar.gz (50.8 kB view details)

Uploaded Apr 11, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

trainpulse-0.4.0-py3-none-any.whl (32.5 kB view details)

Uploaded Apr 11, 2026 Python 3

File details

Details for the file trainpulse-0.4.0.tar.gz.

File metadata

Download URL: trainpulse-0.4.0.tar.gz
Upload date: Apr 11, 2026
Size: 50.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for trainpulse-0.4.0.tar.gz
Algorithm	Hash digest
SHA256	`df5e4464940386a3d10644893767ed19ba10c10e81c64ea23a4dd92f6a833e8f`
MD5	`d8df1075b46c5400df285b5ac21c4a68`
BLAKE2b-256	`27761d01d64afed23684af8a4ad772c9a2ea2977c82a7edf986de0f5ef8f4126`

See more details on using hashes here.

File details

Details for the file trainpulse-0.4.0-py3-none-any.whl.

File metadata

Download URL: trainpulse-0.4.0-py3-none-any.whl
Upload date: Apr 11, 2026
Size: 32.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for trainpulse-0.4.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c8f315d8a35472903ec33edee53de2e065c83d0fb4d92bb29566b1882fd765d9`
MD5	`64118b1a6dcee8bf613dfccd577a87ac`
BLAKE2b-256	`8b1c437981b98af9ecf26ca2abb402d4e5caf6e843c92f14594ddb51d517919a`

See more details on using hashes here.

trainpulse 0.4.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

trainpulse

Why trainpulse?

Install

Quick Start

Minimal — 2 lines

Full training loop

Callback API

Real-time alerts

Detectors

CLI

Health Score

API Reference

Monitor(config=None)

MonitorConfig

TrainingReport

See Also

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`Monitor(config=None)`

`MonitorConfig`

`TrainingReport`