Skip to main content

Never lose loop progress again — auto-saves and resumes any Python loop from exactly where it crashed.

Project description

loopz — Never lose loop progress again.

Add one decorator. Any Python loop auto-resumes from exactly where it crashed.

PyPI version Python License: MIT Tests


The Problem

You are processing 100,000 images. Your Colab session drops at 60,000.
You are training a model for 50 epochs. Your laptop dies at epoch 30.
You start over. Every single time.

loopz fixes this.


Install

pip install loopz

Quick Start

import loopz

@loopz.track("process_images", save_every=100)
def process(image_path):
    extract_and_save_features(image_path)

process(all_image_paths)   # 💥 crash at 60k?  run again → resumes at 60k ✅

That is the entire API for the common case.
One decorator. One argument. Done.


ML Training — Full State Save

loopz saves and restores your model weights, optimizer state, LR scheduler, GradScaler, and any accumulators living inside the loop — all automatically.

import loopz
import torch

model     = MyModel()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=50)

running_loss = [0.0]   # wrap in a list so loopz can restore it in-place
best_acc     = [0.0]

@loopz.track(
    "training",
    save_every  = 1,                                          # save every epoch
    state       = {"model": model, "optimizer": optimizer,
                   "scheduler": scheduler},
    loop_vars   = {"running_loss": running_loss,
                   "best_acc":     best_acc},
    notify      = print,                                      # or send a Telegram/webhook
)
def train(epoch):
    loss, acc = train_one_epoch(model, train_loader, optimizer, scheduler)
    running_loss[0] += loss
    best_acc[0]      = max(best_acc[0], acc)
    print(f"Epoch {epoch} | loss={loss:.4f} | acc={acc:.4f}")

train(range(50))
# 💥 crashes at epoch 12?  run the script again →
# 🔁 loopz: Resuming 'training' from 12/50 (24.0% done)
#    State     : ['model', 'optimizer', 'scheduler'] ✅
#    Loop vars : ['running_loss', 'best_acc'] ✅

What Gets Saved

Object Supported
torch.nn.Module
torch.nn.DataParallel
torch.nn.parallel.DistributedDataParallel
torch.optim.Optimizer (Adam, SGD, AdamW, …)
torch.optim.lr_scheduler.*
torch.cuda.amp.GradScaler
torch.Tensor
numpy.ndarray
sklearn estimator
Plain Python object (any picklable)
Python / Numpy / PyTorch / CUDA random state
Variables inside the loop (running_loss, best_acc, …)

API Reference

@loopz.track(...)

@loopz.track(
    job_name   = "my_job",    # unique name — used for resume lookup
    save_every = 10,          # checkpoint every N iterations
    state      = {...},       # ML objects to save (optional)
    loop_vars  = {...},       # accumulators inside the loop (optional)
    notify     = callable,    # called on completion or crash (optional)
)
def process(item):
    ...

process(my_list)

loopz.status()

Print a summary of all incomplete (saved) jobs.

📋 loopz — 1 saved job(s):

  🔁 training
     Progress : 12/50 (24.0%)
     Saved at : 2026-03-22 14:30:00
     Crashed  : training crash at epoch 12

loopz.reset("job_name")

Delete all saved data for a job — it will start fresh next run.

loopz.reset_all()

Delete all saved data for every job.


How It Works

  1. On every save_every-th iteration loopz atomically writes:

    • your loop position (JSON)
    • your ML object weights (.state)
    • your loop variables (.vars)
    • the full random seed state (Python + Numpy + PyTorch + CUDA)
  2. On crash or KeyboardInterrupt, it saves one final checkpoint then re-raises the original exception so your stack trace is still visible.

  3. On the next run, loopz detects the saved position, restores all state, and resumes the loop from exactly that index.

  4. On clean completion, all saved files are deleted automatically.


Limitations (be honest)

  • Primitives as loop_varsint, float, str cannot be mutated in-place in Python. Wrap them in a list: loss = [0.0] not loss = 0.0.
  • Distributed training (multi-node) — DDP on a single machine is supported; multi-node DDP across separate machines is not.
  • Custom C++ extensions — if your model uses custom CUDA ops with non-standard state, manual checkpointing is needed alongside loopz.
  • Non-picklable objects — if an object in state= cannot be pickled, loopz will print a warning and skip it.

License

MIT © Shivrajsinh Jadeja

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

loopz-1.0.0.tar.gz (26.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

loopz-1.0.0-py3-none-any.whl (15.5 kB view details)

Uploaded Python 3

File details

Details for the file loopz-1.0.0.tar.gz.

File metadata

  • Download URL: loopz-1.0.0.tar.gz
  • Upload date:
  • Size: 26.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.7

File hashes

Hashes for loopz-1.0.0.tar.gz
Algorithm Hash digest
SHA256 b970a7536ec6b5a2846cfbcf551b5efc043c313025e1df834d03fdc59b36aa4e
MD5 d4a8af4a8031533c68c1a2325b8d7a6b
BLAKE2b-256 ba622b6214a03914835d76e558c706d75cb61e2b61ebe028ea32dd698ee9b89c

See more details on using hashes here.

File details

Details for the file loopz-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: loopz-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 15.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.7

File hashes

Hashes for loopz-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fa0b3e0a259baf3f9328155377869caf1ae337f21232d0b7fbd4a7825950556e
MD5 052e9e050c2dfd732fac1d5437e69a84
BLAKE2b-256 0ce3e0436b0a786b80554ff8fa00beafac040c1feb57cdae01de69bc404389d0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page