Skip to main content

Build Pytorch Based NN Projects Faster

Project description

LayerZero

A modular PyTorch training framework with automatic performance optimizations.

Features

Trainer

  • Model compilation via torch.compile() (PyTorch 2.0+)
  • Mixed precision training (AMP)
  • Automatic GPU augmentation integration
  • Asynchronous CUDA data transfers
  • Real-time TensorBoard logging
  • PyTorch Profiler integration (GPU/CPU/memory analysis)
  • Metric tracking and logging
  • Model checkpointing
  • Custom callbacks

ImageDataLoader

  • GPU-accelerated augmentation using Kornia
  • Configurable augmentation modes
  • Automatic worker detection
  • Torchvision dataset support
  • Pinned memory for GPU training

Helper

  • Training/validation metric tracking
  • Loss curve visualization
  • Experiment logging

Performance Optimizations

Applied automatically:

  • torch.compile() for model compilation
  • Mixed precision (FP16) training
  • Non-blocking CUDA transfers
  • GPU-based augmentation (Kornia)
  • Optimized DataLoader configuration

Monitoring & Analysis:

  • Real-time TensorBoard logging (loss, metrics, lr)
  • PyTorch Profiler integration (find bottlenecks)
  • GPU/CPU utilization tracking
  • Memory usage profiling

Installation

pip install torch torchvision matplotlib tensorboard torch-tb-profiler

# Optional: GPU augmentation
pip install kornia kornia-rs

Or install from PyPI:

pip install LayerZero

Note: torch-tb-profiler is required to view PyTorch Profiler traces in TensorBoard.


Usage

Basic Example

import torch
from torch import nn
from LayerZero import ImageDataLoader, ImageLoaderConfig, Trainer, TrainerConfig
from torchvision.datasets import CIFAR10

# Model
model = nn.Sequential(
    nn.Flatten(),
    nn.Linear(3*32*32, 128),
    nn.ReLU(),
    nn.Linear(128, 10)
)

# Data
config = ImageLoaderConfig(
    data_dir='./data',
    batch_size=128,
    use_gpu_augmentation='auto',  # Automatic GPU acceleration
)
loader = ImageDataLoader(CIFAR10, image_size=32, config=config)

train_loader, test_loader = loader.get_loaders()

# Training configuration
config = TrainerConfig(
    epochs=10,
    amp=True,
    compile_model='auto',
    use_tensorboard=True  # TensorBoard enabled by default!
)

# Train
trainer = Trainer(
    model=model,
    loss_fn=nn.CrossEntropyLoss(),
    optimizer=torch.optim.Adam(model.parameters()),
    config=config
)

results = trainer.fit(
    train_loader, 
    test_loader,
    data_loader=loader  # Auto-detects GPU augmentation!
)

View Training in Real-Time:

# Google Colab / Kaggle (inline in notebook - two separate commands):
%load_ext tensorboard
%tensorboard --logdir runs
# Local / Terminal:
tensorboard --logdir=runs
# Then open: http://localhost:6006

Configuration

Augmentation Modes

from LayerZero import ImageDataLoader, ImageLoaderConfig, AugmentationMode

config = ImageLoaderConfig(
    augmentation_mode=AugmentationMode.MINIMAL,  # Flip + Crop
)
loader = ImageDataLoader(CIFAR10, image_size=224, config=config)

GPU Augmentation

# Automatic integration with Trainer (Recommended)
config = ImageLoaderConfig(
    use_gpu_augmentation='auto',  # Auto-detect GPU and Kornia
    auto_install_kornia=True      # Install if missing
)
loader = ImageDataLoader(CIFAR10, image_size=224, config=config)

train_loader, test_loader = loader.get_loaders()

# GPU augmentation auto-detected when fit() is called!
trainer = Trainer(
    model=model,
    loss_fn=nn.CrossEntropyLoss(),
    optimizer=torch.optim.Adam(model.parameters()),
    config=config
)

trainer.fit(
    train_loader,
    test_loader,
    data_loader=loader  # ← Pass loader here, Trainer auto-detects GPU aug!
)

# Manual usage in custom training loops
gpu_aug = loader.get_gpu_augmentation(device='cuda')

for X, y in train_loader:
    X = X.to(device)
    X = gpu_aug(X)
    # ... training code ...

Mixed Precision

config = TrainerConfig(
    amp=True,   # Enable (default)
    # amp=False  # Disable for debugging
)

Model Compilation

config = TrainerConfig(
    compile_model='auto',        # Auto-detect PyTorch 2.0+
    compile_mode='default',      # Compilation mode
    # compile_mode='reduce-overhead'
    # compile_mode='max-autotune'
)

TensorBoard (Real-Time Monitoring) 📊

Automatically enabled by default! Works seamlessly in Google Colab, Kaggle, and local environments.

🎯 Google Colab / Kaggle Usage (Recommended)

# Step 1: Load TensorBoard extension (run once at top of notebook)
%load_ext tensorboard

# Step 2: Train your model (TensorBoard logs automatically)
trainer = Trainer(model, loss_fn, optimizer, config=TrainerConfig(epochs=10))
trainer.fit(train_loader, val_loader)

# Step 3: View TensorBoard inline in your notebook
%tensorboard --logdir runs

That's it! TensorBoard will display directly in your Colab/Kaggle notebook with real-time updates.

💻 Local / Terminal Usage

# Terminal 1: Start training
python train.py

# Terminal 2: Start TensorBoard
tensorboard --logdir=runs

# Open browser to: http://localhost:6006

⚙️ Configuration

config = TrainerConfig(
    use_tensorboard=True,              # Enable/disable (default: True)
    tensorboard_log_dir="runs",        # Log directory
    tensorboard_comment="experiment1", # Experiment name/tag
    tensorboard_log_graph=True,        # Log model graph
    tensorboard_log_gradients=False,   # Log gradient histograms (slower)
)

📈 What Gets Logged

  • ✅ Train & validation losses (real-time, per epoch)
  • ✅ All custom metrics (accuracy, F1, etc.)
  • ✅ Learning rate changes over time
  • ✅ Model graph visualization (optional)
  • ✅ Gradient & weight histograms (optional)
  • PyTorch Profiler (optional - GPU/CPU utilization, memory, bottlenecks)

🔧 Advanced Options

Disable TensorBoard:

config = TrainerConfig(
    use_tensorboard=False  # Turn off TensorBoard logging
)

Multiple experiments with names:

# Experiment 1
config1 = TrainerConfig(tensorboard_comment="resnet50_lr0.001")
trainer1.fit(train_loader, val_loader)

# Experiment 2
config2 = TrainerConfig(tensorboard_comment="resnet50_lr0.01")
trainer2.fit(train_loader, val_loader)

# View both: %tensorboard --logdir runs

Manual callback control:

from LayerZero import Trainer, TensorBoardCallback

tb_callback = TensorBoardCallback(
    log_dir="my_experiments",
    comment="custom_experiment",
    log_gradients=True  # Enable gradient logging
)

trainer = Trainer(
    model=model,
    loss_fn=loss_fn,
    optimizer=optimizer,
    config=TrainerConfig(use_tensorboard=False),  # Disable auto-init
    callbacks=[tb_callback]  # Add manually
)

📱 Colab/Kaggle Quick Start

# Complete Colab/Kaggle example
%load_ext tensorboard

from LayerZero import ImageDataLoader, Trainer, TrainerConfig
from torchvision.datasets import CIFAR10
import torch.nn as nn

# Setup model and data
model = nn.Sequential(...)
loader = ImageDataLoader(CIFAR10, root='./data', batch_size=128)
train_loader, val_loader = loader.get_loaders()

# Train with TensorBoard (automatic)
trainer = Trainer(
    model=model,
    loss_fn=nn.CrossEntropyLoss(),
    optimizer=torch.optim.Adam(model.parameters()),
    config=TrainerConfig(epochs=10)  # TensorBoard enabled by default!
)

trainer.fit(train_loader, val_loader, data_loader=loader)

# View results inline
%tensorboard --logdir runs

🔬 PyTorch Profiler Integration (Performance Analysis)

NEW! Analyze GPU/CPU utilization, memory usage, and identify bottlenecks - all in TensorBoard!

# Enable profiler with TensorBoard
config = TrainerConfig(
    epochs=10,
    use_tensorboard=True,
    use_profiler=True,  # Enable PyTorch Profiler
)

trainer = Trainer(model, loss_fn, optimizer, config=config)
trainer.fit(train_loader, val_loader)

# View profiler traces in TensorBoard
%tensorboard --logdir runs
# Look for the "PYTORCH_PROFILER" or "PROFILE" tab (requires torch-tb-profiler)

What you'll see:

  • 📊 GPU/CPU utilization timeline
  • 💾 Memory usage over time (allocated/reserved)
  • ⚡ Operation timing breakdown
  • 🔍 Bottleneck identification (slow ops highlighted)
  • 📈 Kernel execution trace

Requirements:

  • torch-tb-profiler must be installed: pip install torch-tb-profiler
  • The profiler tab will appear after profiling data is generated

Fine-tune profiler schedule:

config = TrainerConfig(
    use_profiler=True,
    profiler_schedule_wait=1,      # Skip first N batches
    profiler_schedule_warmup=1,    # Warmup for N batches
    profiler_schedule_active=3,    # Profile for N batches
    profiler_schedule_repeat=2,    # Repeat cycle N times
)

Why use the profiler?

  • Find GPU idle time (data loading bottlenecks)
  • Identify slow operations
  • Optimize memory usage
  • Compare different model architectures
  • Debug performance issues

⚠️ Performance Note:

  • TensorBoard (default): < 1% overhead ✅
  • Gradient logging: ~5-10% overhead (disabled by default)
  • Profiler: ~10-15% overhead (disabled by default)
  • Logging happens once per epoch, not per batch
  • Safe to keep TensorBoard enabled for all training

Example: Optimizing based on profiler insights

# Before profiling: Found data loading is slow
# Solution: Increase num_workers

loader = ImageDataLoader(
    CIFAR10,
    batch_size=128,
    num_workers=4,  # Increased from default
    use_gpu_augmentation='auto'  # Move augmentation to GPU
)

Custom Metrics

def accuracy_fn(y_pred, y_true):
    return (y_pred.argmax(1) == y_true).float().mean().item() * 100

config = TrainerConfig(
    metrics={'accuracy': accuracy_fn}
)

Callbacks

def save_checkpoint(model, epoch, metrics):
    torch.save(model.state_dict(), f'model_epoch_{epoch}.pt')

config = TrainerConfig(
    callbacks={'on_epoch_end': save_checkpoint}
)

API Reference

ImageDataLoader

ImageDataLoader(
    dataset_cls,                          # Torchvision dataset class
    root='./data',                        # Data directory
    image_size=224,                       # Image size
    batch_size=64,                        # Batch size
    augmentation_mode=AugmentationMode.BASIC,
    use_gpu_augmentation='auto',
    auto_install_kornia=True,
    num_workers=None,                     # Auto-detect
    download=False,
)

TrainerConfig

TrainerConfig(
    epochs=10,
    amp=True,                          # Mixed precision
    compile_model='auto',              # torch.compile()
    compile_mode='default',
    device='auto',
    save_dir='./checkpoints',
    # TensorBoard settings
    use_tensorboard=True,              # Enable TensorBoard (default: True)
    tensorboard_log_dir='runs',        # TensorBoard log directory
    tensorboard_comment='',            # Experiment name/comment
    tensorboard_log_graph=True,        # Log model graph
    tensorboard_log_gradients=False,   # Log gradient histograms
    # PyTorch Profiler settings (integrates with TensorBoard)
    use_profiler=False,                # Enable PyTorch Profiler (default: False)
    profiler_schedule_wait=1,          # Batches to skip before profiling
    profiler_schedule_warmup=1,        # Warmup batches
    profiler_schedule_active=3,        # Active profiling batches
    profiler_schedule_repeat=2,        # Number of profiling cycles
)

Trainer

Trainer(
    model,
    loss_fn,
    optimizer,
    config,
    metrics=None,
    callbacks=None,
)

# Run training with optional GPU augmentation auto-detection
trainer.fit(
    train_loader, 
    val_loader,
    epochs=None,        # Optional: Override config.epochs
    data_loader=None    # Optional: ImageDataLoader for GPU aug auto-detection
)

trainer.evaluate(dataloader)  # Evaluate on data
trainer.predict(dataloader)   # Get predictions

KorniaHelper

from LayerZero import (
    is_kornia_available,
    install_kornia,
    ensure_kornia,
    get_kornia_version,
)

if ensure_kornia(auto_install=True):
    # Kornia available
    pass

Architecture

LayerZero/
├── Trainer.py              # Training loop
├── ImageDataLoader.py      # Data loading
├── GPUAugmentation.py      # Kornia augmentation
├── AugmentationMode.py     # Augmentation enums
├── KorniaHelper.py         # Kornia management
└── Helper.py               # Metrics tracking

Troubleshooting

Kornia installation fails

pip install kornia kornia-rs

torch.compile not available

Requires PyTorch 2.0+:

pip install --upgrade torch torchvision

Out of memory

Reduce batch size or enable mixed precision:

config = TrainerConfig(amp=True)

Slow on CPU

Use minimal augmentation:

config = ImageLoaderConfig(augmentation_mode=AugmentationMode.MINIMAL)
loader = ImageDataLoader(CIFAR10, image_size=224, config=config)

Releasing New Versions

# Bump version (bug fixes: 0.1.3 → 0.1.4)
make bump-patch

# Push to trigger PyPI release
make release

See RELEASE_WORKFLOW.md for complete guide.


License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

layerzero-0.5.2.tar.gz (25.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

layerzero-0.5.2-py3-none-any.whl (28.2 kB view details)

Uploaded Python 3

File details

Details for the file layerzero-0.5.2.tar.gz.

File metadata

  • Download URL: layerzero-0.5.2.tar.gz
  • Upload date:
  • Size: 25.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.18

File hashes

Hashes for layerzero-0.5.2.tar.gz
Algorithm Hash digest
SHA256 f5eaf2c93815c0fb43d997f304e3d1477c13e0c343444cadcad7258d5e44be9e
MD5 b79b4d7d34362ef94d952764109937c2
BLAKE2b-256 fa57ae5f7d43593003e72fd64ce3492f8fd8047f38092266fa81c2afc2a0c363

See more details on using hashes here.

File details

Details for the file layerzero-0.5.2-py3-none-any.whl.

File metadata

  • Download URL: layerzero-0.5.2-py3-none-any.whl
  • Upload date:
  • Size: 28.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.18

File hashes

Hashes for layerzero-0.5.2-py3-none-any.whl
Algorithm Hash digest
SHA256 c50acc3328c26244c0c904c009c794a22b055c15e1c758cc005fd5920b47dffe
MD5 74097e7ee70c3c420987c62e246947bf
BLAKE2b-256 19cee63c7453037b79561482b62cfa786e3cddb0a7c8c54dcefc7c0f8e0feeb0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page