Skip to main content

Local-first ML training visualization with Learning Timeline - watch how your model learns sample by sample.

Project description

G R A D I A

Next-Generation Local-First ML Training Visualization

PyPI Version Python Version License Build Status

From observing training to understanding learning.

Gradia Dashboard


๐Ÿš€ What's New in v2.0.0

Gradia v2.0.0 introduces the Learning Timeline โ€” a real-time, sample-centric view of how your models learn over time. This release transforms Gradia from a metrics dashboard into a learning behavior explorer.

โœจ Flagship Feature: Learning Timeline

The Learning Timeline answers questions that aggregate metrics cannot:

  • ๐ŸŽฏ When did this sample become correctly classified?
  • ๐Ÿ”„ Which samples keep flipping predictions?
  • ๐Ÿ“ˆ Is the model memorizing or stabilizing?
  • โš ๏ธ Which data points drive learning instability?

Learning Timeline


๐Ÿ“– Overview

Gradia is a high-performance, local-first monitoring solution for machine learning workflows. Unlike cloud-native platforms, Gradia focuses on zero-latency, privacy-first tracking that runs directly alongside your training loop.

Built on FastAPI and a Reactive UI, Gradia provides granular visibility into your model's training dynamics, system resources, and now โ€” individual sample learning behavior.


โšก Key Features

Feature Description
๐Ÿ”ฌ Learning Timeline Track how individual samples evolve during training with real-time visualization
๐Ÿ“Š Real-Time Telemetry Nanosecond-precision tracking of Loss, Accuracy, and custom metrics
๐Ÿง  Intelligent Auto-Discovery Automatic task type inference (Classification vs Regression) and model suggestions
๐Ÿ’ป System Profiling CPU and RAM monitoring during training epochs
๐Ÿ“ Artifact Management Automated checkpointing and structured logging (events.jsonl)
๐Ÿ“‹ Comprehensive Reporting One-click PDF/JSON reports with full training history
๐Ÿ”„ Backward Compatible Full support for v1.x runs with automatic migration

๐Ÿ”ฌ Learning Timeline Deep Dive

How It Works

The Learning Timeline tracks a bounded subset of samples (default: 100) throughout training, capturing:

  • Prediction โ€” What the model predicts for each sample
  • Confidence โ€” Model's certainty in its prediction
  • Correctness โ€” Whether the prediction matches the true label
  • Flip Events โ€” When predictions change between epochs

Sample Classification

Gradia automatically classifies tracked samples into categories:

Category Description Visual
Stable Correct Consistently correct predictions ๐ŸŸข Green
Late Learner Became correct after epoch N ๐ŸŸก Yellow
Unstable Predictions flip frequently ๐ŸŸ  Orange
Persistent Error Never correctly classified ๐Ÿ”ด Red

UI Blocks

The Timeline interface is organized into focused blocks:

  1. Block A: Timeline Overview โ€” High-level view of learning stability across all tracked samples
  2. Block B: Sample Inspector โ€” Deep-dive into individual sample trajectories with confidence curves
  3. Block C: Instability Panel โ€” Top flipping samples, late learners, and persistent errors
  4. Block D: Training Context โ€” Current epoch, status, and tracking metadata

๐Ÿ› ๏ธ Architecture

Gradia employs a Producer-Consumer architecture:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Trainer Thread โ”‚โ”€โ”€โ”€โ–ถโ”‚   Event Queue   โ”‚โ”€โ”€โ”€โ–ถโ”‚   FastAPI UI    โ”‚
โ”‚   (Producer)    โ”‚    โ”‚ (Thread-Safe)   โ”‚    โ”‚   (Consumer)    โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
        โ”‚                                              โ”‚
        โ–ผ                                              โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                          โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Sample Tracker  โ”‚                          โ”‚ Timeline Logger โ”‚
โ”‚   (v2.0 New)    โ”‚                          โ”‚   (v2.0 New)    โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                          โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

New v2.0 Components

  • gradia.events โ€” Event model with LearningEvent, SampleState, EpochSummary
  • SampleTracker โ€” Boundary-aware sample selection and tracking
  • TimelineLogger โ€” Structured timeline event persistence
  • SchemaMigrator โ€” Automatic v1.x to v2.0 config migration

๐Ÿ“ฆ Installation

pip install gradia --upgrade

From Source

git clone https://github.com/STiFLeR7/gradia.git
cd gradia
pip install -e ".[dev]"

๐Ÿ’ป Quick Start

Basic Usage

# Auto-detect datasets and start the dashboard
gradia run .

Advanced CLI

# Specify target column and port
gradia run . --target "label" --port 8080

Python API

from gradia.trainer.engine import Trainer
from gradia.core.scenario import ScenarioInferrer
from gradia.core.config import ConfigManager

# Infer scenario from dataset
inferrer = ScenarioInferrer()
scenario = inferrer.infer("data.csv", target_override="label")

# Configure training with timeline enabled
config_mgr = ConfigManager("./runs")
config = config_mgr.load_or_create()
config['model']['type'] = 'random_forest'
config['training']['epochs'] = 20
config['timeline']['enabled'] = True
config['timeline']['max_samples'] = 100

# Run training
trainer = Trainer(scenario, config, "./runs")
trainer.run()

# Get timeline insights
insights = trainer.get_timeline_insights()
print(f"Stable samples: {insights['stable_correct']}")
print(f"Flipping samples: {insights['top_flippers']}")

๐Ÿ“Š Dashboard

Access the dashboard at http://localhost:8000 after running gradia run .

Pages

Page URL Description
Configure /configure Select model, hyperparameters, and start training
Metrics / Real-time training metrics and system resources
Timeline /timeline Learning Timeline visualization (v2.0)

Configuration Options

# Example gradia_config.yaml (auto-generated)
schema_version: "2.0"
project_name: "my-experiment"
save_model: true

model:
  type: "random_forest"
  params:
    n_estimators: 100
    max_depth: null

training:
  epochs: 20
  test_split: 0.2
  random_seed: 42

timeline:
  enabled: true
  max_samples: 100
  sampling_strategy: "boundary"

๐Ÿ”„ Migration from v1.x

Gradia v2.0 is fully backward compatible. When you run gradia run . on a v1.x project:

  1. Existing configs are automatically migrated to v2.0 schema
  2. Old runs remain accessible
  3. Timeline features are enabled by default
# Migration happens automatically
gradia run .
# Output: Config migrated: Added timeline config, Set schema_version to 2.0

๐Ÿงช Testing

# Run all tests
pytest tests/ -v

# Run with coverage
pytest tests/ --cov=gradia --cov-report=html

๐Ÿ“ Project Structure

gradia/
โ”œโ”€โ”€ cli/              # Typer CLI application
โ”œโ”€โ”€ core/             # Configuration, inspection, migration
โ”œโ”€โ”€ events/           # v2.0 Event model and tracking
โ”‚   โ”œโ”€โ”€ models.py     # LearningEvent, SampleState, EpochSummary
โ”‚   โ”œโ”€โ”€ tracker.py    # SampleTracker with boundary sampling
โ”‚   โ””โ”€โ”€ logger.py     # TimelineLogger for event persistence
โ”œโ”€โ”€ models/           # sklearn wrappers and model factory
โ”œโ”€โ”€ trainer/          # Training engine with timeline integration
โ””โ”€โ”€ viz/              # FastAPI server and UI templates
    โ”œโ”€โ”€ templates/    # Jinja2 HTML templates
    โ””โ”€โ”€ static/       # CSS and JavaScript

๐Ÿ—บ๏ธ Roadmap

v2.1 (Planned)

  • WebSocket real-time updates
  • Dataset Intelligence Panel
  • Experiment Comparison (overlay 2-3 runs)
  • Export timeline to video/GIF

v2.2 (Future)

  • PyTorch integration
  • TensorFlow/Keras support
  • Remote monitoring mode

๐Ÿค Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

# Development setup
git clone https://github.com/STiFLeR7/gradia.git
cd gradia
pip install -e ".[dev]"

# Run tests
pytest tests/ -v

# Lint
flake8 gradia/

๐Ÿ“„ License

Distributed under the MIT License. See LICENSE for more information.


๐Ÿ”— Links


Built with โค๏ธ by STiFLeR for the ML Community.

Instagram โ€ข Hugging Face โ€ข PyPI

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gradia-2.0.0.tar.gz (2.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gradia-2.0.0-py3-none-any.whl (2.6 MB view details)

Uploaded Python 3

File details

Details for the file gradia-2.0.0.tar.gz.

File metadata

  • Download URL: gradia-2.0.0.tar.gz
  • Upload date:
  • Size: 2.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for gradia-2.0.0.tar.gz
Algorithm Hash digest
SHA256 08aac0ca89279878f9a3c2fc7e8a9b81458cbb6e012668d19efce036c7bf3c4e
MD5 95d47d831eb1a7ad832a79c2eb5c6b94
BLAKE2b-256 52c95e3487344692b3f5b810cd1f65a0302ffaeb76ffceb3118616d698eb550e

See more details on using hashes here.

File details

Details for the file gradia-2.0.0-py3-none-any.whl.

File metadata

  • Download URL: gradia-2.0.0-py3-none-any.whl
  • Upload date:
  • Size: 2.6 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for gradia-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1d0c246b00ebdc1b8f598fc5e8aa0db75d91a0779fc7a611b30cf395231e4077
MD5 7e8980c4fe8b9e4fe6f1b10fd368cfa2
BLAKE2b-256 c0a95665a15eabb1725fd8e5b36e34abd9a09b79a3df3ca3db9d654bdb615a74

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page