Minimal-decision tools for reproducible, debuggable training experiments.

These details have not been verified by PyPI

Project description

TrainKeeper logo

TrainKeeper

Training-Time System Guardrails for Reliable AI

TrainKeeper is a training-time reliability framework for machine learning systems.
It adds lightweight guardrails around existing training code to make experiments:

reproducible
debuggable
data-safe
training-stable
and system-verifiable

without replacing your stack.

TrainKeeper focuses on what most frameworks ignore:
👉 what happens inside the training loop.

🚨 Why TrainKeeper exists

Most critical ML failures are silent:

non-deterministic experiments
unnoticed data corruption or drift
exploding / vanishing gradients
NaN loss propagation
broken resumes and unreproducible results

TrainKeeper turns training into a controlled system rather than a script.

It does this by providing:

experiment control
data integrity checks
training-time instrumentation
automatic failure capture
and system-level validation scenarios

📦 Install

pip install trainkeeper

Optional extras:

pip install trainkeeper[torch]
pip install trainkeeper[wandb]
pip install trainkeeper[mlflow]

⚡ Quick start

from trainkeeper.experiment import run_reproducible

@run_reproducible(auto_capture_git=True)
def train():
    print("TrainKeeper is running.")
    # your normal training loop

if __name__ == "__main__":
    train()

Each run automatically produces:

experiment.yaml, run.json
system.json, env.txt
seeds.json, run.sh
checkpoints and failure reports

No pipeline rewrite. No framework lock-in.

🧠 Core runtime modules

Module	Purpose
`experiment`	reproducible runs, environment capture, replay
`datacheck`	schema enforcement, drift detection, data profiling
`debugger`	training hooks, instability detection, failure snapshots
`trainutils`	deterministic dataloaders, mixed precision, checkpoints
`monitor`	runtime metrics and behavior tracking
`pkg`	export helpers (ONNX, TorchScript, packaging)

🖥 CLI

tk init
tk run -- python train.py
tk replay <exp-id> -- python train.py
tk compare <exp-a> <exp-b>
tk repro-summary <runs-dir>
tk doctor

The CLI exposes TrainKeeper as a system tool, not just a library.

🧪 System validation (what makes TrainKeeper different)

TrainKeeper is not only a framework.
It is validated through a multi-scenario reliability suite (in the GitHub repo):

Scenario 1 — Reproducibility Lab
Deterministic execution, resume behavior, experiment traceability.

Scenario 2 — Data Corruption Lab
Schema violations, NaNs, label shift, silent distribution drift.

Scenario 3 — Training Robustness Lab
Exploding gradients, NaN loss, optimizer instability, bad batch capture.

These scenarios are orchestrated by a system hardening layer that produces:

unified summaries
failure matrices
cross-scenario system reports

TrainKeeper therefore tests itself.

PyPI package = runtime framework only
Scenarios & system tests = repository-only

🏗 Architecture

TrainKeeper inserts a guardrail layer between your training code and the system.

User Training Code
        ↓
TrainKeeper Runtime (experiment, datacheck, debugger, trainutils)
        ↓
Structured Artifacts & Reports
        ↓
System Validation Layer (scenarios + system tests)

(Full architecture diagram is available in the GitHub repository.)

🎓 Typical use cases

research reproducibility & experiment audits
training-time debugging
data integrity enforcement
reliability testing for ML systems
controlled failure experiments
AI systems research platforms

🔗 Project links

GitHub: https://github.com/mosh3eb/TrainKeeper
Issues & roadmap: https://github.com/mosh3eb/TrainKeeper/issues

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.3.0

Feb 18, 2026

This version

0.2.3

Jan 29, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

trainkeeper-0.2.3.tar.gz (28.1 kB view details)

Uploaded Jan 29, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

trainkeeper-0.2.3-py3-none-any.whl (30.4 kB view details)

Uploaded Jan 29, 2026 Python 3

File details

Details for the file trainkeeper-0.2.3.tar.gz.

File metadata

Download URL: trainkeeper-0.2.3.tar.gz
Upload date: Jan 29, 2026
Size: 28.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.0

File hashes

Hashes for trainkeeper-0.2.3.tar.gz
Algorithm	Hash digest
SHA256	`27f8441723ad17aa13b9ad7f16bf8ed087fff150b0492b63fc28bfed43f1f53b`
MD5	`c1b669942bbd23b7b3c4b8d6eba37dca`
BLAKE2b-256	`eb923ab231f50a3e452f79a2b4aee2f4022e775f25b46ed64e298a28844852ce`

See more details on using hashes here.

File details

Details for the file trainkeeper-0.2.3-py3-none-any.whl.

File metadata

Download URL: trainkeeper-0.2.3-py3-none-any.whl
Upload date: Jan 29, 2026
Size: 30.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.0

File hashes

Hashes for trainkeeper-0.2.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e6bc1589ba4377d892bf5fd36e617e6329d83d4d3f71fb35cc8beba367c0a81e`
MD5	`63078e3917ac2085f878825fde4805d7`
BLAKE2b-256	`587a12ddb32c8636288a23d267ce4baf2057131d7912508bd2e6685777106d17`

See more details on using hashes here.

trainkeeper 0.2.3

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

TrainKeeper

Training-Time System Guardrails for Reliable AI

🚨 Why TrainKeeper exists

📦 Install

⚡ Quick start

🧠 Core runtime modules

🖥 CLI

🧪 System validation (what makes TrainKeeper different)

🏗 Architecture

🎓 Typical use cases

🔗 Project links

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes