Build Pytorch Based NN Projects Faster
Project description
LayerZero
A modular PyTorch training framework with automatic performance optimizations.
Features
Trainer
- Model compilation via
torch.compile()(PyTorch 2.0+) - Mixed precision training (AMP)
- Automatic GPU augmentation integration
- Asynchronous CUDA data transfers
- Real-time TensorBoard logging
- PyTorch Profiler integration (GPU/CPU/memory analysis)
- Metric tracking and logging
- Model checkpointing
- Custom callbacks
ImageDataLoader
- GPU-accelerated augmentation using Kornia
- Configurable augmentation modes
- Automatic worker detection
- Torchvision dataset support
- Pinned memory for GPU training
Helper
- Training/validation metric tracking
- Loss curve visualization
- Experiment logging
Performance Optimizations
Applied automatically:
torch.compile()for model compilation- Mixed precision (FP16) training
- Non-blocking CUDA transfers
- GPU-based augmentation (Kornia)
- Optimized DataLoader configuration
Monitoring & Analysis:
- Real-time TensorBoard logging (loss, metrics, lr)
- PyTorch Profiler integration (find bottlenecks)
- GPU/CPU utilization tracking
- Memory usage profiling
Installation
pip install torch torchvision matplotlib tensorboard torch-tb-profiler
# Optional: GPU augmentation
pip install kornia kornia-rs
Or install from PyPI:
pip install LayerZero
Note: torch-tb-profiler is required to view PyTorch Profiler traces in TensorBoard.
Usage
Basic Example
import torch
from torch import nn
from LayerZero import ImageDataLoader, ImageLoaderConfig, Trainer, TrainerConfig
from torchvision.datasets import CIFAR10
# Model
model = nn.Sequential(
nn.Flatten(),
nn.Linear(3*32*32, 128),
nn.ReLU(),
nn.Linear(128, 10)
)
# Data
config = ImageLoaderConfig(
data_dir='./data',
batch_size=128,
use_gpu_augmentation='auto', # Automatic GPU acceleration
)
loader = ImageDataLoader(CIFAR10, image_size=32, config=config)
train_loader, test_loader = loader.get_loaders()
# Training configuration
config = TrainerConfig(
epochs=10,
amp=True,
compile_model='auto',
use_tensorboard=True # TensorBoard enabled by default!
)
# Train
trainer = Trainer(
model=model,
loss_fn=nn.CrossEntropyLoss(),
optimizer=torch.optim.Adam(model.parameters()),
config=config
)
results = trainer.fit(
train_loader,
test_loader,
data_loader=loader # Auto-detects GPU augmentation!
)
View Training in Real-Time:
# Google Colab / Kaggle (inline in notebook - two separate commands):
%load_ext tensorboard
%tensorboard --logdir runs
# Local / Terminal:
tensorboard --logdir=runs
# Then open: http://localhost:6006
Configuration
Augmentation Modes
from LayerZero import ImageDataLoader, ImageLoaderConfig, AugmentationMode
config = ImageLoaderConfig(
augmentation_mode=AugmentationMode.MINIMAL, # Flip + Crop
)
loader = ImageDataLoader(CIFAR10, image_size=224, config=config)
GPU Augmentation
# Automatic integration with Trainer (Recommended)
config = ImageLoaderConfig(
use_gpu_augmentation='auto', # Auto-detect GPU and Kornia
auto_install_kornia=True # Install if missing
)
loader = ImageDataLoader(CIFAR10, image_size=224, config=config)
train_loader, test_loader = loader.get_loaders()
# GPU augmentation auto-detected when fit() is called!
trainer = Trainer(
model=model,
loss_fn=nn.CrossEntropyLoss(),
optimizer=torch.optim.Adam(model.parameters()),
config=config
)
trainer.fit(
train_loader,
test_loader,
data_loader=loader # ← Pass loader here, Trainer auto-detects GPU aug!
)
# Manual usage in custom training loops
gpu_aug = loader.get_gpu_augmentation(device='cuda')
for X, y in train_loader:
X = X.to(device)
X = gpu_aug(X)
# ... training code ...
Mixed Precision
config = TrainerConfig(
amp=True, # Enable (default)
# amp=False # Disable for debugging
)
Model Compilation
config = TrainerConfig(
compile_model='auto', # Auto-detect PyTorch 2.0+
compile_mode='default', # Compilation mode
# compile_mode='reduce-overhead'
# compile_mode='max-autotune'
)
TensorBoard (Real-Time Monitoring) 📊
Automatically enabled by default! Works seamlessly in Google Colab, Kaggle, and local environments.
🎯 Google Colab / Kaggle Usage (Recommended)
# Step 1: Load TensorBoard extension (run once at top of notebook)
%load_ext tensorboard
# Step 2: Train your model (TensorBoard logs automatically)
trainer = Trainer(model, loss_fn, optimizer, config=TrainerConfig(epochs=10))
trainer.fit(train_loader, val_loader)
# Step 3: View TensorBoard inline in your notebook
%tensorboard --logdir runs
That's it! TensorBoard will display directly in your Colab/Kaggle notebook with real-time updates.
💻 Local / Terminal Usage
# Terminal 1: Start training
python train.py
# Terminal 2: Start TensorBoard
tensorboard --logdir=runs
# Open browser to: http://localhost:6006
⚙️ Configuration
config = TrainerConfig(
use_tensorboard=True, # Enable/disable (default: True)
tensorboard_log_dir="runs", # Log directory
tensorboard_comment="experiment1", # Experiment name/tag
tensorboard_log_graph=True, # Log model graph
tensorboard_log_gradients=False, # Log gradient histograms (slower)
)
📈 What Gets Logged
- ✅ Train & validation losses (real-time, per epoch)
- ✅ All custom metrics (accuracy, F1, etc.)
- ✅ Learning rate changes over time
- ✅ Model graph visualization (optional)
- ✅ Gradient & weight histograms (optional)
- ✅ PyTorch Profiler (optional - GPU/CPU utilization, memory, bottlenecks)
🔧 Advanced Options
Disable TensorBoard:
config = TrainerConfig(
use_tensorboard=False # Turn off TensorBoard logging
)
Multiple experiments with names:
# Experiment 1
config1 = TrainerConfig(tensorboard_comment="resnet50_lr0.001")
trainer1.fit(train_loader, val_loader)
# Experiment 2
config2 = TrainerConfig(tensorboard_comment="resnet50_lr0.01")
trainer2.fit(train_loader, val_loader)
# View both: %tensorboard --logdir runs
Manual callback control:
from LayerZero import Trainer, TensorBoardCallback
tb_callback = TensorBoardCallback(
log_dir="my_experiments",
comment="custom_experiment",
log_gradients=True # Enable gradient logging
)
trainer = Trainer(
model=model,
loss_fn=loss_fn,
optimizer=optimizer,
config=TrainerConfig(use_tensorboard=False), # Disable auto-init
callbacks=[tb_callback] # Add manually
)
📱 Colab/Kaggle Quick Start
# Complete Colab/Kaggle example
%load_ext tensorboard
from LayerZero import ImageDataLoader, Trainer, TrainerConfig
from torchvision.datasets import CIFAR10
import torch.nn as nn
# Setup model and data
model = nn.Sequential(...)
loader = ImageDataLoader(CIFAR10, root='./data', batch_size=128)
train_loader, val_loader = loader.get_loaders()
# Train with TensorBoard (automatic)
trainer = Trainer(
model=model,
loss_fn=nn.CrossEntropyLoss(),
optimizer=torch.optim.Adam(model.parameters()),
config=TrainerConfig(epochs=10) # TensorBoard enabled by default!
)
trainer.fit(train_loader, val_loader, data_loader=loader)
# View results inline
%tensorboard --logdir runs
🔬 PyTorch Profiler Integration (Performance Analysis)
NEW! Analyze GPU/CPU utilization, memory usage, and identify bottlenecks - all in TensorBoard!
# Enable profiler with TensorBoard
config = TrainerConfig(
epochs=10,
use_tensorboard=True,
use_profiler=True, # Enable PyTorch Profiler
)
trainer = Trainer(model, loss_fn, optimizer, config=config)
trainer.fit(train_loader, val_loader)
# View profiler traces in TensorBoard
%tensorboard --logdir runs
# Look for the "PYTORCH_PROFILER" or "PROFILE" tab (requires torch-tb-profiler)
What you'll see:
- 📊 GPU/CPU utilization timeline
- 💾 Memory usage over time (allocated/reserved)
- ⚡ Operation timing breakdown
- 🔍 Bottleneck identification (slow ops highlighted)
- 📈 Kernel execution trace
Requirements:
torch-tb-profilermust be installed:pip install torch-tb-profiler- The profiler tab will appear after profiling data is generated
Fine-tune profiler schedule:
config = TrainerConfig(
use_profiler=True,
profiler_schedule_wait=1, # Skip first N batches
profiler_schedule_warmup=1, # Warmup for N batches
profiler_schedule_active=3, # Profile for N batches
profiler_schedule_repeat=2, # Repeat cycle N times
)
Why use the profiler?
- Find GPU idle time (data loading bottlenecks)
- Identify slow operations
- Optimize memory usage
- Compare different model architectures
- Debug performance issues
⚠️ Performance Note:
- TensorBoard (default): < 1% overhead ✅
- Gradient logging: ~5-10% overhead (disabled by default)
- Profiler: ~10-15% overhead (disabled by default)
- Logging happens once per epoch, not per batch
- Safe to keep TensorBoard enabled for all training
Example: Optimizing based on profiler insights
# Before profiling: Found data loading is slow
# Solution: Increase num_workers
loader = ImageDataLoader(
CIFAR10,
batch_size=128,
num_workers=4, # Increased from default
use_gpu_augmentation='auto' # Move augmentation to GPU
)
Custom Metrics
def accuracy_fn(y_pred, y_true):
return (y_pred.argmax(1) == y_true).float().mean().item() * 100
config = TrainerConfig(
metrics={'accuracy': accuracy_fn}
)
Callbacks
def save_checkpoint(model, epoch, metrics):
torch.save(model.state_dict(), f'model_epoch_{epoch}.pt')
config = TrainerConfig(
callbacks={'on_epoch_end': save_checkpoint}
)
API Reference
ImageDataLoader
ImageDataLoader(
dataset_cls, # Torchvision dataset class
root='./data', # Data directory
image_size=224, # Image size
batch_size=64, # Batch size
augmentation_mode=AugmentationMode.BASIC,
use_gpu_augmentation='auto',
auto_install_kornia=True,
num_workers=None, # Auto-detect
download=False,
)
TrainerConfig
TrainerConfig(
epochs=10,
amp=True, # Mixed precision
compile_model='auto', # torch.compile()
compile_mode='default',
device='auto',
save_dir='./checkpoints',
# TensorBoard settings
use_tensorboard=True, # Enable TensorBoard (default: True)
tensorboard_log_dir='runs', # TensorBoard log directory
tensorboard_comment='', # Experiment name/comment
tensorboard_log_graph=True, # Log model graph
tensorboard_log_gradients=False, # Log gradient histograms
# PyTorch Profiler settings (integrates with TensorBoard)
use_profiler=False, # Enable PyTorch Profiler (default: False)
profiler_schedule_wait=1, # Batches to skip before profiling
profiler_schedule_warmup=1, # Warmup batches
profiler_schedule_active=3, # Active profiling batches
profiler_schedule_repeat=2, # Number of profiling cycles
)
Trainer
Trainer(
model,
loss_fn,
optimizer,
config,
metrics=None,
callbacks=None,
)
# Run training with optional GPU augmentation auto-detection
trainer.fit(
train_loader,
val_loader,
epochs=None, # Optional: Override config.epochs
data_loader=None # Optional: ImageDataLoader for GPU aug auto-detection
)
trainer.evaluate(dataloader) # Evaluate on data
trainer.predict(dataloader) # Get predictions
KorniaHelper
from LayerZero import (
is_kornia_available,
install_kornia,
ensure_kornia,
get_kornia_version,
)
if ensure_kornia(auto_install=True):
# Kornia available
pass
Architecture
LayerZero/
├── Trainer.py # Training loop
├── ImageDataLoader.py # Data loading
├── GPUAugmentation.py # Kornia augmentation
├── AugmentationMode.py # Augmentation enums
├── KorniaHelper.py # Kornia management
└── Helper.py # Metrics tracking
Troubleshooting
Kornia installation fails
pip install kornia kornia-rs
torch.compile not available
Requires PyTorch 2.0+:
pip install --upgrade torch torchvision
Out of memory
Reduce batch size or enable mixed precision:
config = TrainerConfig(amp=True)
Slow on CPU
Use minimal augmentation:
config = ImageLoaderConfig(augmentation_mode=AugmentationMode.MINIMAL)
loader = ImageDataLoader(CIFAR10, image_size=224, config=config)
Releasing New Versions
# Bump version (bug fixes: 0.1.3 → 0.1.4)
make bump-patch
# Push to trigger PyPI release
make release
See RELEASE_WORKFLOW.md for complete guide.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file layerzero-0.5.10.tar.gz.
File metadata
- Download URL: layerzero-0.5.10.tar.gz
- Upload date:
- Size: 28.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9e8aba8451382d1e7832f680361569be1795b9ab68509ac5b93ade0c5228ae3b
|
|
| MD5 |
10abc66e284946aa55a8fa64ded40a8d
|
|
| BLAKE2b-256 |
2fb5b76eb0c46ba9562e7b946b45e1e7755b78ca1f7d9316b170fa7db1090920
|
File details
Details for the file layerzero-0.5.10-py3-none-any.whl.
File metadata
- Download URL: layerzero-0.5.10-py3-none-any.whl
- Upload date:
- Size: 31.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
40a3a2c219589d7d18d1dfd8e41dc2939334f1f5bf9483417dab631937e2f21a
|
|
| MD5 |
b46c26968ae46ec17356990789651585
|
|
| BLAKE2b-256 |
91fab42b5a6ee232b679b850f6e5c24ca66c3c84d16cf1ee3569608f2c33b6c2
|