Git-like deterministic checkpointing for ML training

These details have not been verified by PyPI

Project description

Gradient

Git-like deterministic checkpointing for ML training with anchor + delta checkpoints, forkable branches, and a workspace/repo hierarchy.

Watch the Demo Video

Visit the site

Read the docs

Highlights

Anchor + delta checkpointing to reduce storage by up to 80%.
Deterministic resume (model state, RNG, optimizer, scheduler).
Branching and forking from any checkpoint ref.
Workspace/repo hierarchy for organizing multiple models.
Auto-create mode - just specify workspace + repo, everything is created automatically.
Git-style CLI with workspace and repo management commands.
Manifest-based run metadata for dashboards and tooling.

Install

pip install gradient-desc

Required dependency: torch.

Quick Start

Zero Setup (Auto-Create Mode)

The simplest way to get started - just specify a workspace and repo name:

import torch
import torch.nn as nn
import torch.optim as optim
from gradient import GradientEngine

model = nn.Linear(4, 1)
opt = optim.Adam(model.parameters(), lr=1e-3)

# Both workspace and repo are auto-created!
engine = GradientEngine.attach(
    model, opt,
    workspace="./my_workspace",
    repo="my_model"
)
engine.autocommit(every=5)

start = engine.current_step
for step in range(start + 1, start + 21):
    loss = (model(torch.randn(32, 4)) ** 2).mean()
    loss.backward()
    opt.step()
    opt.zero_grad(set_to_none=True)
    engine.maybe_commit(step)

CLI-Initialized Workflow

For more control, initialize workspace and repo explicitly:

# Initialize workspace
gradient workspace init ./ml-experiments

# Create a repo for your model
cd ml-experiments
gradient repo init gpt4 --description "GPT-4 training runs"

# Check status
gradient workspace status

Then in your training script:

from gradient import GradientEngine

# Auto-discovers workspace/repo from current directory
engine = GradientEngine.attach(model, optimizer)

Workspace/Repo Hierarchy

Gradient organizes checkpoints in a Git-like hierarchy:

my_workspace/           # Workspace (contains multiple repos)
├── .gradient/          # Workspace marker
│   └── config.json
├── gpt4/               # Repo (one model)
│   ├── .gradient-repo/ # Repo marker
│   │   └── config.json
│   ├── manifest.json
│   ├── ckpt_main_s0.pt
│   └── ckpt_main_s100.pt
└── llama/              # Another repo
    └── ...

Workspace: Contains multiple repos (one per model/project)
Repo: Contains branches and checkpoints for a single model

CLI

Workspace Commands

gradient workspace init [path]        # Initialize a new workspace
gradient workspace status             # Show all repos in workspace

Repo Commands

gradient repo init <name> [-d DESC]   # Create a new repo in workspace
gradient repo list                    # List all repos

Auth Commands

gradient login [--token TOKEN] [--verify-url URL]

gradient login verifies your access token with the remote auth endpoint, stores the token in your OS keyring (gradient-cli service), and writes non-secret session metadata to ~/.gradient/auth.json.

Training Commands

gradient status                       # Show current repo status
gradient resume <ref> -- python train.py
gradient fork <from_ref> <new_branch> [--reset-optimizer] [--seed N] -- python train.py

Checkpoint Refs

Refs use the format branch@step:

main@100 - step 100 on main branch
experiment@50 - step 50 on experiment branch
latest - most recent checkpoint on current branch

Environment Variables

Set by the CLI for training script handoff:

GRADIENT_WORKSPACE: workspace path
GRADIENT_REPO: repo name
GRADIENT_RESUME_REF: checkpoint ref to resume from
GRADIENT_BRANCH: branch name override
GRADIENT_AUTOCOMMIT: auto-commit interval

Optional override for gradient login:

GRADIENT_AUTH_VERIFY_URL: token verification endpoint URL (defaults to production endpoint)

Public API

Import Surface

from gradient import (
    GradientEngine,
    GradientConfig,
    # Workspace/Repo management
    WorkspaceConfig,
    RepoConfig,
    init_workspace,
    init_repo,
    find_workspace,
    find_repo,
    resolve_context,
)

GradientEngine.attach

Attach to a model and optimizer for checkpointing:

# Auto-create mode (simplest)
engine = GradientEngine.attach(
    model, optimizer,
    workspace="./my_workspace",
    repo="my_model"
)

# Auto-discover from current directory
engine = GradientEngine.attach(model, optimizer)

# With explicit config
engine = GradientEngine.attach(
    model, optimizer,
    scheduler=lr_scheduler,
    config=GradientConfig(
        workspace_path="./my_workspace",
        repo_name="my_model",
        branch="experiment",
    )
)

Behavior:

Auto-creates workspace and repo if both are explicitly provided
Auto-discovers from current directory if inside an initialized repo
Respects CLI environment variables for handoff
Creates manifest.json on first attach

Checkpoint Operations

engine.commit(step, message="")      # Write checkpoint (anchor or delta)
engine.resume("main@100")            # Resume from ref
engine.resume_latest()               # Resume latest on current branch
engine.fork(
    from_ref="main@100",
    new_branch="experiment",
    reset_optimizer=False,
    reset_scheduler=False,
    reset_rng_seed=None,
    message=""
)

Training + Commit Patterns

# Periodic auto-commit
engine.autocommit(every=10)
start = engine.current_step

for step in range(start + 1, start + 1001):
    loss = train_step(...)
    engine.maybe_commit(step)

# Manual milestone commits
for step in range(start + 1, start + 501):
    loss = train_step(...)

    if step in {1, 50, 100, 250, 500}:
        engine.commit(step, message=f"milestone step {step}")

Training Loop Helpers

engine.autocommit(every=10)          # Set auto-commit interval
engine.maybe_commit(step)            # Commit if step matches interval
engine.current_step                  # Step resumed from (0 for fresh run)

Properties

engine.workspace_path                # Path to workspace
engine.repo_name                     # Current repo name
engine.repo_path                     # Full path to repo
engine.branch                        # Current branch name

Extensibility

engine.register_state(
    "env_state",
    getter=lambda: env.get_state(),
    setter=lambda s: env.set_state(s)
)

GradientConfig

GradientConfig(
    workspace_path="./my_workspace",
    repo_name="my_model",
    branch="main",
    reanchor_interval=None,
    compression="auto",  # "off" | "auto" | "aggressive"
)

Notes:

reanchor_interval: force new anchor after N delta checkpoints
compression: lightweight delta compression mode

Manifest Format

manifest.json is created in each repo and updated on every commit:

{
  "repo_name": "my_model",
  "checkpoints": [
    {
      "step": 10,
      "branch": "main",
      "file": "ckpt_main_s10.pt",
      "type": "delta"
    }
  ]
}

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.17

Mar 17, 2026

This version

0.1.15

Mar 17, 2026

0.1.14

Mar 17, 2026

0.1.13

Mar 17, 2026

0.1.12

Mar 17, 2026

0.1.11

Feb 27, 2026

0.1.9

Feb 26, 2026

0.1.8

Feb 15, 2026

0.1.7

Feb 11, 2026

0.1.6

Feb 10, 2026

0.1.4

Feb 7, 2026

0.1.3

Feb 7, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gradient_desc-0.1.15.tar.gz (31.4 kB view details)

Uploaded Mar 17, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

gradient_desc-0.1.15-py3-none-any.whl (34.0 kB view details)

Uploaded Mar 17, 2026 Python 3

File details

Details for the file gradient_desc-0.1.15.tar.gz.

File metadata

Download URL: gradient_desc-0.1.15.tar.gz
Upload date: Mar 17, 2026
Size: 31.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for gradient_desc-0.1.15.tar.gz
Algorithm	Hash digest
SHA256	`4d0c2138fac34ee3c9539d04cd5cc0b21d219e852bf9c0439da598c91653d2db`
MD5	`1cc1e0a88cd4cec4c8dd250c56eb6089`
BLAKE2b-256	`1e2d293395514d5a409ebb61a77b4f773d8628515427ce5c5fe6bfeac69784c6`

See more details on using hashes here.

Provenance

The following attestation bundles were made for gradient_desc-0.1.15.tar.gz:

Publisher: publish.yml on malhar2805/Gradient

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: gradient_desc-0.1.15.tar.gz
- Subject digest: 4d0c2138fac34ee3c9539d04cd5cc0b21d219e852bf9c0439da598c91653d2db
- Sigstore transparency entry: 1114064210
- Sigstore integration time: Mar 17, 2026
Source repository:
- Permalink: malhar2805/Gradient@3bfa93977ceee9f7ee95cd463f35313fd051f845
- Branch / Tag: refs/tags/v0.1.15
- Owner: https://github.com/malhar2805
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@3bfa93977ceee9f7ee95cd463f35313fd051f845
- Trigger Event: push

File details

Details for the file gradient_desc-0.1.15-py3-none-any.whl.

File metadata

Download URL: gradient_desc-0.1.15-py3-none-any.whl
Upload date: Mar 17, 2026
Size: 34.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for gradient_desc-0.1.15-py3-none-any.whl
Algorithm	Hash digest
SHA256	`803afe547cd3624e2fca8e9bec88ff9599b69b1a96c630b2fc1aa7317e3b94c0`
MD5	`6b700c97da3ec28a13bb05ec7dc7c6c1`
BLAKE2b-256	`67154d3786170311044d8c0c031c3d3342a9031c0151e93ae1edae517a463989`

See more details on using hashes here.

Provenance

The following attestation bundles were made for gradient_desc-0.1.15-py3-none-any.whl:

Publisher: publish.yml on malhar2805/Gradient

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: gradient_desc-0.1.15-py3-none-any.whl
- Subject digest: 803afe547cd3624e2fca8e9bec88ff9599b69b1a96c630b2fc1aa7317e3b94c0
- Sigstore transparency entry: 1114064215
- Sigstore integration time: Mar 17, 2026
Source repository:
- Permalink: malhar2805/Gradient@3bfa93977ceee9f7ee95cd463f35313fd051f845
- Branch / Tag: refs/tags/v0.1.15
- Owner: https://github.com/malhar2805
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@3bfa93977ceee9f7ee95cd463f35313fd051f845
- Trigger Event: push

gradient-desc 0.1.15

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Gradient

Highlights

Install

Quick Start

Zero Setup (Auto-Create Mode)

CLI-Initialized Workflow

Workspace/Repo Hierarchy

CLI

Workspace Commands

Repo Commands

Auth Commands

Training Commands

Checkpoint Refs

Environment Variables

Public API

Import Surface

GradientEngine.attach

Checkpoint Operations

Training + Commit Patterns

Training Loop Helpers

Properties

Extensibility

GradientConfig

Manifest Format

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance