Skip to main content

Build Pytorch Based NN Projects Faster

Project description

LayerZero

A modular PyTorch training framework with automatic performance optimizations.

Features

Trainer

  • Model compilation via torch.compile() (PyTorch 2.0+)
  • Mixed precision training (AMP)
  • Automatic GPU augmentation integration
  • Asynchronous CUDA data transfers
  • Real-time TensorBoard logging
  • PyTorch Profiler integration (GPU/CPU/memory analysis)
  • Metric tracking and logging
  • Model checkpointing
  • Custom callbacks

ImageDataLoader

  • GPU-accelerated augmentation using Kornia
  • Configurable augmentation modes
  • Automatic worker detection
  • Torchvision dataset support
  • Pinned memory for GPU training

Helper

  • Training/validation metric tracking
  • Loss curve visualization
  • Experiment logging

Performance Optimizations

Applied automatically:

  • torch.compile() for model compilation
  • Mixed precision (FP16) training
  • Non-blocking CUDA transfers
  • GPU-based augmentation (Kornia)
  • Optimized DataLoader configuration

Monitoring & Analysis:

  • Real-time TensorBoard logging (loss, metrics, lr)
  • PyTorch Profiler integration (find bottlenecks)
  • GPU/CPU utilization tracking
  • Memory usage profiling

Installation

pip install torch torchvision matplotlib tensorboard torch-tb-profiler

# Optional: GPU augmentation
pip install kornia kornia-rs

Or install from PyPI:

pip install LayerZero

Note: torch-tb-profiler is required to view PyTorch Profiler traces in TensorBoard.


Usage

Basic Example

import torch
from torch import nn
from LayerZero import ImageDataLoader, ImageLoaderConfig, Trainer, TrainerConfig
from torchvision.datasets import CIFAR10

# Model
model = nn.Sequential(
    nn.Flatten(),
    nn.Linear(3*32*32, 128),
    nn.ReLU(),
    nn.Linear(128, 10)
)

# Data
config = ImageLoaderConfig(
    data_dir='./data',
    batch_size=128,
    use_gpu_augmentation='auto',  # Automatic GPU acceleration
)
loader = ImageDataLoader(CIFAR10, image_size=32, config=config)

train_loader, test_loader = loader.get_loaders()

# Training configuration
config = TrainerConfig(
    epochs=10,
    amp=True,
    compile_model='auto',
    use_tensorboard=True  # TensorBoard enabled by default!
)

# Train
trainer = Trainer(
    model=model,
    loss_fn=nn.CrossEntropyLoss(),
    optimizer=torch.optim.Adam(model.parameters()),
    config=config
)

results = trainer.fit(
    train_loader, 
    test_loader,
    data_loader=loader  # Auto-detects GPU augmentation!
)

View Training in Real-Time:

# Google Colab / Kaggle (inline in notebook - two separate commands):
%load_ext tensorboard
%tensorboard --logdir runs
# Local / Terminal:
tensorboard --logdir=runs
# Then open: http://localhost:6006

Configuration

Augmentation Modes

from LayerZero import ImageDataLoader, ImageLoaderConfig, AugmentationMode

config = ImageLoaderConfig(
    augmentation_mode=AugmentationMode.MINIMAL,  # Flip + Crop
)
loader = ImageDataLoader(CIFAR10, image_size=224, config=config)

GPU Augmentation

# Automatic integration with Trainer (Recommended)
config = ImageLoaderConfig(
    use_gpu_augmentation='auto',  # Auto-detect GPU and Kornia
    auto_install_kornia=True      # Install if missing
)
loader = ImageDataLoader(CIFAR10, image_size=224, config=config)

train_loader, test_loader = loader.get_loaders()

# GPU augmentation auto-detected when fit() is called!
trainer = Trainer(
    model=model,
    loss_fn=nn.CrossEntropyLoss(),
    optimizer=torch.optim.Adam(model.parameters()),
    config=config
)

trainer.fit(
    train_loader,
    test_loader,
    data_loader=loader  # ← Pass loader here, Trainer auto-detects GPU aug!
)

# Manual usage in custom training loops
gpu_aug = loader.get_gpu_augmentation(device='cuda')

for X, y in train_loader:
    X = X.to(device)
    X = gpu_aug(X)
    # ... training code ...

Mixed Precision

config = TrainerConfig(
    amp=True,   # Enable (default)
    # amp=False  # Disable for debugging
)

Model Compilation

config = TrainerConfig(
    compile_model='auto',        # Auto-detect PyTorch 2.0+
    compile_mode='default',      # Compilation mode
    # compile_mode='reduce-overhead'
    # compile_mode='max-autotune'
)

TensorBoard (Real-Time Monitoring) 📊

Automatically enabled by default! Works seamlessly in Google Colab, Kaggle, and local environments.

🎯 Google Colab / Kaggle Usage (Recommended)

# Step 1: Load TensorBoard extension (run once at top of notebook)
%load_ext tensorboard

# Step 2: Train your model (TensorBoard logs automatically)
trainer = Trainer(model, loss_fn, optimizer, config=TrainerConfig(epochs=10))
trainer.fit(train_loader, val_loader)

# Step 3: View TensorBoard inline in your notebook
%tensorboard --logdir runs

That's it! TensorBoard will display directly in your Colab/Kaggle notebook with real-time updates.

💻 Local / Terminal Usage

# Terminal 1: Start training
python train.py

# Terminal 2: Start TensorBoard
tensorboard --logdir=runs

# Open browser to: http://localhost:6006

⚙️ Configuration

config = TrainerConfig(
    use_tensorboard=True,              # Enable/disable (default: True)
    tensorboard_log_dir="runs",        # Log directory
    tensorboard_comment="experiment1", # Experiment name/tag
    tensorboard_log_graph=True,        # Log model graph
    tensorboard_log_gradients=False,   # Log gradient histograms (slower)
)

📈 What Gets Logged

  • ✅ Train & validation losses (real-time, per epoch)
  • ✅ All custom metrics (accuracy, F1, etc.)
  • ✅ Learning rate changes over time
  • ✅ Model graph visualization (optional)
  • ✅ Gradient & weight histograms (optional)
  • PyTorch Profiler (optional - GPU/CPU utilization, memory, bottlenecks)

🔧 Advanced Options

Disable TensorBoard:

config = TrainerConfig(
    use_tensorboard=False  # Turn off TensorBoard logging
)

Multiple experiments with names:

# Experiment 1
config1 = TrainerConfig(tensorboard_comment="resnet50_lr0.001")
trainer1.fit(train_loader, val_loader)

# Experiment 2
config2 = TrainerConfig(tensorboard_comment="resnet50_lr0.01")
trainer2.fit(train_loader, val_loader)

# View both: %tensorboard --logdir runs

Manual callback control:

from LayerZero import Trainer, TensorBoardCallback

tb_callback = TensorBoardCallback(
    log_dir="my_experiments",
    comment="custom_experiment",
    log_gradients=True  # Enable gradient logging
)

trainer = Trainer(
    model=model,
    loss_fn=loss_fn,
    optimizer=optimizer,
    config=TrainerConfig(use_tensorboard=False),  # Disable auto-init
    callbacks=[tb_callback]  # Add manually
)

📱 Colab/Kaggle Quick Start

# Complete Colab/Kaggle example
%load_ext tensorboard

from LayerZero import ImageDataLoader, Trainer, TrainerConfig
from torchvision.datasets import CIFAR10
import torch.nn as nn

# Setup model and data
model = nn.Sequential(...)
loader = ImageDataLoader(CIFAR10, root='./data', batch_size=128)
train_loader, val_loader = loader.get_loaders()

# Train with TensorBoard (automatic)
trainer = Trainer(
    model=model,
    loss_fn=nn.CrossEntropyLoss(),
    optimizer=torch.optim.Adam(model.parameters()),
    config=TrainerConfig(epochs=10)  # TensorBoard enabled by default!
)

trainer.fit(train_loader, val_loader, data_loader=loader)

# View results inline
%tensorboard --logdir runs

🔬 PyTorch Profiler Integration (Performance Analysis)

NEW! Analyze GPU/CPU utilization, memory usage, and identify bottlenecks - all in TensorBoard!

# Enable profiler with TensorBoard
config = TrainerConfig(
    epochs=10,
    use_tensorboard=True,
    use_profiler=True,  # Enable PyTorch Profiler
)

trainer = Trainer(model, loss_fn, optimizer, config=config)
trainer.fit(train_loader, val_loader)

# View profiler traces in TensorBoard
%tensorboard --logdir runs
# Look for the "PYTORCH_PROFILER" or "PROFILE" tab (requires torch-tb-profiler)

What you'll see:

  • 📊 GPU/CPU utilization timeline
  • 💾 Memory usage over time (allocated/reserved)
  • ⚡ Operation timing breakdown
  • 🔍 Bottleneck identification (slow ops highlighted)
  • 📈 Kernel execution trace

Requirements:

  • torch-tb-profiler must be installed: pip install torch-tb-profiler
  • The profiler tab will appear after profiling data is generated

Fine-tune profiler schedule:

config = TrainerConfig(
    use_profiler=True,
    profiler_schedule_wait=1,      # Skip first N batches
    profiler_schedule_warmup=1,    # Warmup for N batches
    profiler_schedule_active=3,    # Profile for N batches
    profiler_schedule_repeat=2,    # Repeat cycle N times
)

Why use the profiler?

  • Find GPU idle time (data loading bottlenecks)
  • Identify slow operations
  • Optimize memory usage
  • Compare different model architectures
  • Debug performance issues

⚠️ Performance Note:

  • TensorBoard (default): < 1% overhead ✅
  • Gradient logging: ~5-10% overhead (disabled by default)
  • Profiler: ~10-15% overhead (disabled by default)
  • Logging happens once per epoch, not per batch
  • Safe to keep TensorBoard enabled for all training

Example: Optimizing based on profiler insights

# Before profiling: Found data loading is slow
# Solution: Increase num_workers

loader = ImageDataLoader(
    CIFAR10,
    batch_size=128,
    num_workers=4,  # Increased from default
    use_gpu_augmentation='auto'  # Move augmentation to GPU
)

Custom Metrics

def accuracy_fn(y_pred, y_true):
    return (y_pred.argmax(1) == y_true).float().mean().item() * 100

config = TrainerConfig(
    metrics={'accuracy': accuracy_fn}
)

Callbacks

def save_checkpoint(model, epoch, metrics):
    torch.save(model.state_dict(), f'model_epoch_{epoch}.pt')

config = TrainerConfig(
    callbacks={'on_epoch_end': save_checkpoint}
)

API Reference

ImageDataLoader

ImageDataLoader(
    dataset_cls,                          # Torchvision dataset class
    root='./data',                        # Data directory
    image_size=224,                       # Image size
    batch_size=64,                        # Batch size
    augmentation_mode=AugmentationMode.BASIC,
    use_gpu_augmentation='auto',
    auto_install_kornia=True,
    num_workers=None,                     # Auto-detect
    download=False,
)

TrainerConfig

TrainerConfig(
    epochs=10,
    amp=True,                          # Mixed precision
    compile_model='auto',              # torch.compile()
    compile_mode='default',
    device='auto',
    save_dir='./checkpoints',
    # TensorBoard settings
    use_tensorboard=True,              # Enable TensorBoard (default: True)
    tensorboard_log_dir='runs',        # TensorBoard log directory
    tensorboard_comment='',            # Experiment name/comment
    tensorboard_log_graph=True,        # Log model graph
    tensorboard_log_gradients=False,   # Log gradient histograms
    # PyTorch Profiler settings (integrates with TensorBoard)
    use_profiler=False,                # Enable PyTorch Profiler (default: False)
    profiler_schedule_wait=1,          # Batches to skip before profiling
    profiler_schedule_warmup=1,        # Warmup batches
    profiler_schedule_active=3,        # Active profiling batches
    profiler_schedule_repeat=2,        # Number of profiling cycles
)

Trainer

Trainer(
    model,
    loss_fn,
    optimizer,
    config,
    metrics=None,
    callbacks=None,
)

# Run training with optional GPU augmentation auto-detection
trainer.fit(
    train_loader, 
    val_loader,
    epochs=None,        # Optional: Override config.epochs
    data_loader=None    # Optional: ImageDataLoader for GPU aug auto-detection
)

trainer.evaluate(dataloader)  # Evaluate on data
trainer.predict(dataloader)   # Get predictions

KorniaHelper

from LayerZero import (
    is_kornia_available,
    install_kornia,
    ensure_kornia,
    get_kornia_version,
)

if ensure_kornia(auto_install=True):
    # Kornia available
    pass

Architecture

LayerZero/
├── Trainer.py              # Training loop
├── ImageDataLoader.py      # Data loading
├── GPUAugmentation.py      # Kornia augmentation
├── AugmentationMode.py     # Augmentation enums
├── KorniaHelper.py         # Kornia management
└── Helper.py               # Metrics tracking

Troubleshooting

Kornia installation fails

pip install kornia kornia-rs

torch.compile not available

Requires PyTorch 2.0+:

pip install --upgrade torch torchvision

Out of memory

Reduce batch size or enable mixed precision:

config = TrainerConfig(amp=True)

Slow on CPU

Use minimal augmentation:

config = ImageLoaderConfig(augmentation_mode=AugmentationMode.MINIMAL)
loader = ImageDataLoader(CIFAR10, image_size=224, config=config)

Releasing New Versions

# Bump version (bug fixes: 0.1.3 → 0.1.4)
make bump-patch

# Push to trigger PyPI release
make release

See RELEASE_WORKFLOW.md for complete guide.


License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

layerzero-0.4.4.tar.gz (25.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

layerzero-0.4.4-py3-none-any.whl (27.8 kB view details)

Uploaded Python 3

File details

Details for the file layerzero-0.4.4.tar.gz.

File metadata

  • Download URL: layerzero-0.4.4.tar.gz
  • Upload date:
  • Size: 25.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.18

File hashes

Hashes for layerzero-0.4.4.tar.gz
Algorithm Hash digest
SHA256 057061055d0c4c2e2b2a0ff99f18035283bfdfc858b6500ff888aa637a91a7c7
MD5 f296bfab969e8d07ac94e2a06fda2929
BLAKE2b-256 d6e05d5cc1a712e3c08303c70b6b326c799297b977e47c3604251c5eb6051b8b

See more details on using hashes here.

File details

Details for the file layerzero-0.4.4-py3-none-any.whl.

File metadata

  • Download URL: layerzero-0.4.4-py3-none-any.whl
  • Upload date:
  • Size: 27.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.18

File hashes

Hashes for layerzero-0.4.4-py3-none-any.whl
Algorithm Hash digest
SHA256 ff357ca011efcffae68d3017338dfdbab400ef0268045b1517364388d9a5d5da
MD5 eccbfb401050f3304a6fdadd4adfbd97
BLAKE2b-256 45b746fc95e4a253b8b76cec85fce28541109f493ea116530eead2c963d64c70

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page