Auto-diagnosis for training curves
Project description
SK-AutoD ๐ฉบ
Auto-diagnose your training curves in seconds.
Stop manually eyeballing loss curves. SK-AutoD analyzes your training data and instantly detects 10+ common pathologies: overfitting, exploding gradients, learning rate issues, underfitting, and more.
Website For Tutorial and documentation
The Problem
Every ML practitioner spends hours staring at loss curves during training:
- "Is this overfitting?"
- "Did my learning rate explode?"
- "Why is my loss stuck?"
- "Should I have stopped earlier?"
Current workflow: Manual eyeballing + Slack screenshots + tribal knowledge.
SK-AutoD solves this: Paste in your arrays โ Get instant, rule-based diagnosis.
Quick Start
Installation
# From PyPI (once published)
pip install sk-autod
# From source (recommended for now)
pip install git+https://github.com/shamiquekhan/SK-AutoD-ML-Library-for-Training-Curve-Auto-Diagnostician.git
Basic Usage
from sk_autod import diagnose
# Your training curves
train_loss = [2.3, 1.9, 1.4, 0.9, 0.5, 0.3, 0.15]
val_loss = [2.4, 2.0, 1.8, 1.9, 2.3, 2.8, 3.4]
# Get instant diagnosis
report = diagnose(train_loss, val_loss)
# Print human-readable summary
print(report.summary())
Output:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
sk_autod Diagnosis Report
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
CRITICAL: Classic overfitting
Detected at epoch 4 (94% confidence)
Val loss rose while train loss fell for 3+ consecutive epochs.
Fix: Add dropout (p=0.3โ0.5), L2 regularisation, or reduce model capacity.
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Summary: 1 critical issue found.
Features
10 Diagnostic Detectors (v0.1.0+)
| Detector | Severity | Description |
|---|---|---|
| Classic overfitting | CRITICAL | Val loss rises while train loss falls |
| Exploding gradient | CRITICAL | Loss spikes >300% in a single epoch |
| LR too high | HIGH | Loss oscillates without clear downtrend |
| LR too low | MEDIUM | Loss decreases extremely slowly |
| Underfitting | HIGH | Both losses plateau at high values |
| Dying ReLU proxy | HIGH | Loss flatlines early at high value |
| Noisy training | MEDIUM | Jagged loss with frequent up-down flips |
| Data leakage proxy | HIGH | Val loss consistently lower than train |
| Missed early stopping | WARNING | Val minimum not used as final checkpoint |
| Label noise floor | MEDIUM | Loss can't drop below suspiciously high threshold |
Multiple Output Formats
- Text: Pretty-printed summaries (with colors)
- JSON: Programmatic access to findings
- HTML: Interactive report with loss curve visualization (v0.2+)
Flexible APIs
# 1. Full diagnosis (rich report)
report = diagnose(train_loss, val_loss)
print(report.summary()) # โ formatted text
data = report.to_dict() # โ JSON-serializable dict
html = report.to_html() # โ standalone HTML (v0.2+)
# 2. One-liner for notebooks
from sk_autod import quick_check
print(quick_check(train_loss, val_loss)) # โ "[CRITICAL] Classic overfitting"
# 3. In-training callback (v0.3+)
from sk_autod import AutoDCallback
cb = AutoDCallback(min_epochs=10, print_live=True)
for epoch in range(100):
# ... your training loop ...
cb.on_epoch_end(epoch, train_loss, val_loss)
Confidence-Scored Findings
Each diagnosis includes a confidence score (0.0โ1.0), helping you prioritize fixes:
for finding in report.findings:
print(f"{finding.detector_name}: {finding.confidence:.1%}")
print(f" โ {finding.fix_recommendation}")
Architecture
User Input (loss arrays)
โ
Preprocessor (align, smooth, compute stats)
โ
[Detector 1] [Detector 2] ... [Detector N] (run in parallel)
โ
DiagnosisReport (deduplicate, sort by severity)
โ
Formatters (text, JSON, HTML)
โ
Output
Key components:
- Finding & DiagnosisReport: Core data models
- Preprocessor: Validates, aligns, smooths with EMA, computes rolling stats
- BaseDetector: Abstract interface for all detectors
- DiagnosticsRunner: Orchestrates detectors, deduplicates findings
- Formatters: Text, JSON, HTML output channels
See ARCHITECTURE.md for complete design details.
CLI Usage
# Command-line diagnosis
sk_autod diagnose \
--train-loss 2.3 1.9 1.4 0.9 0.5 0.3 0.15 \
--val-loss 2.4 2.0 1.8 1.9 2.3 2.8 3.4 \
--output json
# From CSV files
sk_autod diagnose --train-file train_losses.csv --val-file val_losses.csv
# From stdin (pipe-friendly)
echo "2.3 1.9 1.4 0.9" | sk_autod diagnose --train-loss -
Examples
Example 1: Well-Trained Model
train = [2.3, 1.9, 1.4, 0.9, 0.5, 0.3, 0.15]
val = [2.4, 2.0, 1.6, 1.4, 1.3, 1.2, 1.2]
report = diagnose(train, val)
print(report.summary())
# โ No issues found!
Example 2: Classic Overfitting
train = [2.3, 1.9, 1.4, 0.9, 0.5, 0.3, 0.15]
val = [2.4, 2.0, 1.8, 1.9, 2.3, 2.8, 3.4] # diverges
report = diagnose(train, val)
# โ CRITICAL: Classic overfitting at epoch 4 (94% confidence)
# โ Fix: Add dropout, L2 regularisation, reduce capacity
Example 3: Learning Rate Too High
train = [2.3, 1.8, 1.6, 1.9, 1.5, 1.7, 1.4] # oscillates
val = [2.5, 2.0, 1.9, 2.1, 1.8, 2.0, 1.9]
report = diagnose(train, val)
# โ HIGH: LR too high (oscillations detected, variance 2.3ร baseline)
# โ Fix: Reduce LR by 5โ10ร, add warmup schedule
Installation & Setup
From Source (recommended until PyPI publication)
git clone https://github.com/shamiquekhan/SK-AutoD-ML-Library-for-Training-Curve-Auto-Diagnostician.git
cd SK-AutoD-ML-Library-for-Training-Curve-Auto-Diagnostician
pip install -e ".[dev]"
# Run tests
pytest tests/ -v
Project Layout
This repository now includes the pieces you would expect from a polished open-source ML library:
- docs/ for installation, quickstart, detectors, API reference, and architecture notes
- examples/ for runnable usage snippets
- notebooks/ for a minimal Jupyter walkthrough
- scripts/ for maintenance and utility helpers
- benchmarks/ for repeatable performance checks
- .github/workflows/tests.yml for CI checks
- .github/workflows/lint.yml for style and formatting checks
- .github/workflows/publish.yml for tagged release publishing
- CHANGELOG.md for release notes
- Makefile for common development commands
Contributing
We welcome contributions! See CONTRIBUTING.md for guidelines.
Areas to contribute:
- Add new detectors (submit an issue first!)
- Improve threshold tuning on real datasets
- Framework integrations (MLflow, W&B, Kubeflow)
- Documentation & examples
- Bug reports and feature requests
Philosophy & Design
-
Rule-based, not ML-based: Detectors use hand-crafted heuristics, not neural networks.
- Interpretable, debuggable, no training needed
- Works offline, no API calls
-
Zero configuration: Works out-of-the-box with sensible defaults.
- Thresholds are data-agnostic, tuned on 100+ synthetic curves
-
Fail gracefully: Unknown patterns โ no false alarms, just silence.
-
Fast & lightweight: Diagnose 1000+ curves in <1ms.
Performance
Benchmarks on typical curves (100 epochs):
Diagnose 1 curve: 0.2 ms
Diagnose 1000 curves: 180 ms
Memory per curve: ~2 KB
FAQ
Q: Why not use machine learning for detection?
A: ML-based detection would require training data (which curves to flag?), add latency, and reduce interpretability. Rule-based detection is faster, more debuggable, and works offline.
Q: Can I customize detectors?
A: Yes! Subclass BaseDetector and pass to diagnose():
class MyDetector(BaseDetector):
name = "Custom issue"
def detect(self, report):
# your logic here
return [Finding(...)]
report = diagnose(train, val, detectors=[MyDetector()])
Q: Does it support multi-task or multi-metric curves?
A: v0.1 supports 1D loss arrays. Multi-task support in v0.3+.
Q: What if my curves are short (5 epochs)?
A: SK-AutoD requires at least 5 epochs. For shorter runs, some detectors may not fire (e.g., early stopping needs history).
Q: Can I integrate this with my training pipeline?
A: Yes! Callbacks coming in v0.3. For now:
# After each epoch
report = diagnose(train_losses[:epoch], val_losses[:epoch])
if any(f.severity == "CRITICAL" for f in report.findings):
# Stop training or adjust hyperparameters
License
MIT License 2026 Shamique Khan
See LICENSE file for details.
Acknowledgments
- Inspired by discussions in ML communities (r/MachineLearning, FastAI forums)
- Threshold tuning validated on Kaggle competition curves
- Special thanks to early testers and contributors
Contact & Support
- GitHub Issues: Report bugs or request features
- Twitter: @shamiquekhan
- Email: shamiquekhan18@gmail.com
Citation
If SK-AutoD helps your research, please cite:
@software{sk_autod2026,
author = {Khan, Shamique},
title = {SK-AutoD: Auto-Diagnostic System for Training Curves},
year = {2026},
url = {https://github.com/shamiquekhan/SK-AutoD-ML-Library-for-Training-Curve-Auto-Diagnostician}
}
SK-AutoD โ Because your time is more valuable than manual eyeballing.
Star us on GitHub if you find this helpful!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sk_autod-0.1.1.tar.gz.
File metadata
- Download URL: sk_autod-0.1.1.tar.gz
- Upload date:
- Size: 19.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a041343b1d2baae63b22030230ebf36f9cbb20ce2f17c2a59ee38751f28d99fe
|
|
| MD5 |
0251945a4822ddc2ce48f287540bbe3c
|
|
| BLAKE2b-256 |
70605ae41f03b5c8e3390d37f89c58f4e021ef498ec4566caabb282c22005bc3
|
File details
Details for the file sk_autod-0.1.1-py3-none-any.whl.
File metadata
- Download URL: sk_autod-0.1.1-py3-none-any.whl
- Upload date:
- Size: 22.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ca4eb30d7d34b13887cdfeb09e8afd64b5aa5e31f09cfc5166b74d2ba70a5f47
|
|
| MD5 |
3cd5d71858240ce93b9e9bb4d1fb9147
|
|
| BLAKE2b-256 |
2b4089ea846d35a8035bef5553ae026face46bc059ec9dd779443190bfa007ad
|