An enhanced toolkit for deep learning model analysis, profiling, and training optimization.

These details have not been verified by PyPI

Project links

Homepage

Project description

TrainSense: Analyze, Profile, and Optimize your PyTorch Training Workflow

TrainSense is a Python toolkit designed to provide deep insights into your PyTorch model training environment and performance. It helps you understand your system's capabilities, analyze your model's architecture, evaluate hyperparameter choices, profile execution bottlenecks (including full training steps), diagnose gradient issues, and ultimately optimize your deep learning workflow.

Whether you're debugging slow training, trying to maximize GPU utilization, investigating vanishing/exploding gradients, or simply want a clearer picture of your setup, TrainSense offers a suite of tools to assist you.

(GitHub Repo link coming very soon!)

Key Features

System Analysis:
- SystemConfig: Detects hardware (CPU, RAM, GPU), OS, Python, PyTorch, CUDA, and cuDNN versions.
- SystemDiagnostics: Monitors real-time system resource usage (CPU, Memory, Disk, Network).
Model Architecture Insight:
- ArchitectureAnalyzer: Counts parameters (total/trainable), layers, analyzes layer types, estimates input shape, infers architecture type (CNN, RNN, Transformer...), and provides complexity assessment and recommendations.
Hyperparameter Sanity Checks:
- TrainingAnalyzer: Evaluates batch size, learning rate, and epochs based on system resources and model complexity. Provides recommendations and suggests automatic adjustments.
Advanced Performance Profiling:
- ModelProfiler:
  - Measures inference speed (latency, throughput).
  - (New!) Profiles a full training step (data loading, forward, loss, backward, optimizer step) to identify bottlenecks specific to training.
  - Integrates torch.profiler for detailed operator-level CPU/GPU time and memory usage analysis.
Gradient Diagnostics (New!):
- GradientAnalyzer: Inspects gradient statistics (norms, mean, std, NaN/Inf counts) per parameter after a backward pass to help diagnose vanishing/exploding gradients or other training stability issues.
GPU Monitoring:
- GPUMonitor: Provides real-time, detailed GPU status including load, memory utilization (used, total), and temperature (requires GPUtil).
Training Optimization Guidance:
- OptimizerHelper: Suggests suitable optimizers (Adam, AdamW, SGD) and learning rate schedulers based on model characteristics. Recommends initial learning rates.
- UltraOptimizer: Generates a full set of heuristic hyperparameters (batch size, LR, epochs, optimizer, scheduler) as a starting point, based on system, model, and basic data stats.
Consolidated Reporting:
- DeepAnalyzer: Orchestrates most analysis modules (system, architecture, inference profiling, hyperparameters) to generate a comprehensive report with aggregated insights and recommendations. (Note: Currently does not automatically integrate training step profiling or gradient analysis results).
Flexible Logging:
- TrainLogger: Configurable logging to console and rotating files.

What's New (Recent Enhancements)

Training Step Profiling (ModelProfiler.profile_training_step): Go beyond inference! Profile the entire forward-backward-optimizer sequence to understand where time is really spent during training, including data loading overhead.
Gradient Analysis (GradientAnalyzer): Directly inspect the health of your gradients after a backward pass. Calculate norms, check for NaN/Inf values, and get summaries to quickly spot potential training instabilities like vanishing or exploding gradients.

Installation

It's highly recommended to use a virtual environment.

Create and activate a virtual environment:

python -m venv venv
# On Linux/macOS
source venv/bin/activate
# On Windows
# venv\Scripts\activate

Install PyTorch: TrainSense depends on PyTorch. Install the version suitable for your system (especially CUDA version) by following the official instructions: https://pytorch.org/get-started/locally/

Install Dependencies & TrainSense: (Assuming you have the code locally. Replace with pip install trainsense if published on PyPI)

# Ensure requirements.txt lists psutil, GPUtil
pip install -r requirements.txt
pip install .
# Or for development (recommended):
# pip install -e .

Core Concepts

TrainSense aims to provide a holistic view by examining different facets of your training setup:

System Context (SystemConfig, SystemDiagnostics, GPUMonitor): Understand the environment (hardware/software). Can my GPU handle this batch size? Is my CPU bottlenecking data loading?
Model Introspection (ArchitectureAnalyzer): Look inside the model. How complex is it? What layers are used? Might this architecture benefit from AdamW?
Hyperparameter Evaluation (TrainingAnalyzer, OptimizerHelper, UltraOptimizer): Assess training parameters. Is my learning rate too high? Are enough epochs planned? Is SGD appropriate here?
Performance Measurement (ModelProfiler): Measure execution. How fast is inference? Where is time spent during a training step (forward, backward, data)? How much memory is needed?
Training Stability (GradientAnalyzer): Check the learning process itself. Are my gradients vanishing? Are they exploding?
Synthesis (DeepAnalyzer): Combine insights (currently focused on system, arch, inference, hyperparams) into actionable recommendations.

Getting Started: Quick Example

import torch
import torch.nn as nn
import logging
from torch.optim import Adam
from torch.utils.data import DataLoader, TensorDataset

# --- Basic Logging Setup ---
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')

# --- Import Key TrainSense Components ---
# Assuming TrainSense is installed or in PYTHONPATH
from TrainSense import (SystemConfig, ArchitectureAnalyzer, ModelProfiler,
                      DeepAnalyzer, TrainingAnalyzer, SystemDiagnostics,
                      GradientAnalyzer, OptimizerHelper, GPUMonitor, print_section)

# --- Define Your Model & Setup ---
model = nn.Sequential(nn.Linear(128, 64), nn.ReLU(), nn.Linear(64, 10)) # Example model
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
batch_size, lr, epochs = 32, 0.001, 10
# IMPORTANT: Define the correct input shape for your model for one batch!
input_shape = (batch_size, 128)
criterion = nn.CrossEntropyLoss()
optimizer = Adam(model.parameters(), lr=lr)

# Create dummy data for profiling/gradient analysis examples
dummy_X = torch.randn(*input_shape, device='cpu')
dummy_y = torch.randint(0, 10, (input_shape[0],), device='cpu', dtype=torch.long)
dummy_dataset = TensorDataset(dummy_X, dummy_y)
dummy_loader = DataLoader(dummy_dataset, batch_size=batch_size, num_workers=0)

# --- Instantiate TrainSense Components ---
print_section("Initializing TrainSense Components")
try:
    sys_config = SystemConfig()
    sys_diag = SystemDiagnostics()
    arch_analyzer = ArchitectureAnalyzer(model)
    arch_info = arch_analyzer.analyze() # Analyze first for context
    model_profiler = ModelProfiler(model, device=device)
    # Note: Pass the SystemConfig *object*, not the summary dict here
    training_analyzer = TrainingAnalyzer(batch_size, lr, epochs, system_config=sys_config, arch_info=arch_info)
    grad_analyzer = GradientAnalyzer(model) # Needs model

    # DeepAnalyzer combines *some* of the above (sys, arch, inference profile, hyperparams)
    deep_analyzer = DeepAnalyzer(training_analyzer, arch_analyzer, model_profiler, sys_diag)
    print("TrainSense Components Initialized.")

    # --- Run Inference Profiling ---
    print_section("Running Inference Profiling")
    inf_profile = model_profiler.profile_model(input_shape=input_shape, use_torch_profiler=True)
    print(f"- Avg Inf Time: {inf_profile.get('avg_total_time_ms', 'N/A'):.2f} ms")
    print(f"- Max Memory (Inf): {inf_profile.get('max_memory_allocated_formatted', 'N/A')}")

    # --- Run Training Step Profiling ---
    print_section("Running Training Step Profiling")
    train_profile = model_profiler.profile_training_step(dummy_loader, criterion, optimizer, use_torch_profiler=True)
    print(f"- Avg Step Time: {train_profile.get('avg_step_time_ms', 'N/A'):.2f} ms")
    print(f"- Breakdown (%): Data={train_profile.get('percent_time_data_loading', 0):.1f}, Fwd={train_profile.get('percent_time_forward', 0):.1f}, Loss={train_profile.get('percent_time_loss', 0):.1f}, Bwd={train_profile.get('percent_time_backward', 0):.1f}, Opt={train_profile.get('percent_time_optimizer', 0):.1f}")
    print(f"- Max Memory (Train): {train_profile.get('max_memory_allocated_formatted', 'N/A')}")

    # --- Run Gradient Analysis ---
    print_section("Running Gradient Analysis")
    # Need to run a backward pass first!
    model.train()
    optimizer.zero_grad()
    inputs, targets = next(iter(dummy_loader)) # Get one batch
    outputs = model(inputs.to(device))
    loss = criterion(outputs, targets.to(device))
    loss.backward()
    print(f"Ran one backward pass (Loss: {loss.item():.3f})")
    # Now analyze
    grad_summary = grad_analyzer.summary()
    print(f"- Global Grad Norm L2: {grad_summary.get('global_grad_norm_L2', 'N/A'):.2e}")
    print(f"- NaN/Inf Grads Found: {grad_summary.get('num_params_nan_grad', 0)} / {grad_summary.get('num_params_inf_grad', 0)}")
    model.eval() # Set back to eval if needed

    # --- Get Consolidated Report (from DeepAnalyzer) ---
    print_section("Consolidated Report (DeepAnalyzer - Inference Focus)")
    # Note: Uses inference profile data stored within model_profiler instance by default
    report = deep_analyzer.comprehensive_report(profile_input_shape=input_shape)
    print("Overall Recommendations:")
    for rec in report.get("overall_recommendations", []): print(f"- {rec}")

except Exception as e:
    logging.exception("Error during TrainSense example run")
    print(f"\nERROR: {e}")

Detailed Usage Examples

Here's how to use individual components for specific tasks:

1. Checking System Configuration

Get a snapshot of your hardware and software setup.

from TrainSense import SystemConfig, print_section

sys_config = SystemConfig()
summary = sys_config.get_summary() # Get a concise summary

print_section("System Summary")
for key, value in summary.items():
    print(f"- {key.replace('_', ' ').title()}: {value}")

# You can also get the full detailed config
# full_config = sys_config.get_config()
# print("\nFull Config:", full_config)

2. Analyzing Your Model's Architecture

Understand the structure and complexity of your nn.Module.

import torch.nn as nn
from TrainSense import ArchitectureAnalyzer, print_section

# Define or load your model
model = nn.Sequential(
    nn.Linear(784, 128),
    nn.ReLU(),
    nn.Linear(128, 10)
)

arch_analyzer = ArchitectureAnalyzer(model)
analysis = arch_analyzer.analyze() # Performs the analysis

print_section("Architecture Analysis")
print(f"- Total Parameters: {analysis.get('total_parameters', 0):,}")
print(f"- Trainable Parameters: {analysis.get('trainable_parameters', 0):,}")
print(f"- Layer Count: {analysis.get('layer_count', 'N/A')}")
print(f"- Primary Architecture Type: {analysis.get('primary_architecture_type', 'N/A')}")
print(f"- Complexity Category: {analysis.get('complexity_category', 'N/A')}")
print(f"- Estimated Input Shape: {analysis.get('estimated_input_shape', 'N/A')}") # Useful for profiler!
print(f"- Recommendation: {analysis.get('recommendation', 'N/A')}")

print("\n- Layer Types:")
for layer_type, count in analysis.get('layer_types_summary', {}).items():
    print(f"  - {layer_type}: {count}")

3. Getting Hyperparameter Recommendations

Check if your initial batch size, learning rate, and epochs make sense.

from TrainSense import TrainingAnalyzer, SystemConfig, ArchitectureAnalyzer, print_section
import torch.nn as nn # For dummy model

# --- Get Context (System & Model) ---
model = nn.Linear(10, 2) # Simple dummy model
sys_config = SystemConfig()
arch_analyzer = ArchitectureAnalyzer(model)
arch_info = arch_analyzer.analyze()
# -------------------------------------

# --- Define Current Hyperparameters ---
current_batch_size = 512
current_lr = 0.1
current_epochs = 5
# ------------------------------------

analyzer = TrainingAnalyzer(
    batch_size=current_batch_size,
    learning_rate=current_lr,
    epochs=current_epochs,
    system_config=sys_config, # Provide system config *object*
    arch_info=arch_info       # Provide model context
)

print_section("Hyperparameter Checks")
recommendations = analyzer.check_hyperparameters()
print("Recommendations:")
for r in recommendations:
    print(f"- {r}")

print("\nSuggested Adjustments (Heuristic):")
adjustments = analyzer.auto_adjust()
for k, v in adjustments.items():
    original_val = getattr(analyzer, k) # Get original value from analyzer instance
    if v != original_val:
        print(f"- Adjust {k}: from {original_val} to {v}")
    else:
        print(f"- Keep {k}: {v} (unchanged)")

4. Profiling Model Inference Performance

Measure speed and resource usage for inference. Requires a correct input_shape!

import torch
import torch.nn as nn
from TrainSense import ModelProfiler, print_section, format_bytes

# --- Define Model and Device ---
model = nn.Sequential(nn.Linear(64, 64), nn.ReLU(), nn.Linear(64, 10))
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
# -----------------------------

# --- Define Input Shape (Crucial!) ---
# Must match your model's expected input for a single batch
batch_size_for_profiling = 32
input_features = 64
input_shape = (batch_size_for_profiling, input_features)
# ------------------------------------

profiler = ModelProfiler(model, device=device)

print_section("Model Inference Profiling")
print(f"Profiling on {device} with input shape: {input_shape}")

try:
    profile_results = profiler.profile_model(
        input_shape=input_shape,
        iterations=100,      # Number of inference runs for timing
        warmup=20,           # Warmup runs (ignored for timing)
        use_torch_profiler=True # Enable detailed profiling
    )

    if "error" in profile_results:
        print(f"\n!! Profiling Error: {profile_results['error']}")
    else:
        print(f"\n--- Performance ---")
        print(f"- Avg. Inference Time: {profile_results.get('avg_total_time_ms', 0):.3f} ms")
        print(f"- Throughput: {profile_results.get('throughput_samples_per_sec', 0):.1f} samples/sec")

        print(f"\n--- Memory ---")
        print(f"- Peak Memory Allocated: {profile_results.get('max_memory_allocated_formatted', 'N/A')}")
        if device.type == 'cuda':
             print(f"- Peak Memory Reserved (CUDA): {profile_results.get('max_memory_reserved_formatted', 'N/A')}")

        if profile_results.get('use_torch_profiler'):
            print(f"\n--- Detailed Profiler Stats (Averages) ---")
            cpu_perc = profile_results.get('avg_cpu_time_percent', 0)
            gpu_perc = profile_results.get('avg_gpu_time_percent', 0)
            print(f"- Device Utilization: CPU {cpu_perc:.1f}% | GPU {gpu_perc:.1f}%")
            print(f"- Avg CPU Time Total: {profile_results.get('avg_cpu_time_total_ms', 0):.3f} ms")
            if device.type == 'cuda':
                print(f"- Avg CUDA Time Total: {profile_results.get('avg_cuda_time_total_ms', 0):.3f} ms")
            # Optionally print the detailed operator table
            # print("\n--- Top Operators by Self CPU Time ---")
            # print(profile_results.get('profiler_top_ops_summary', 'Table not available.'))

except ValueError as ve:
     print(f"\n!! Input Shape Error: {ve}")
except Exception as e:
     print(f"\n!! An unexpected error occurred during profiling: {e}")

5. Profiling a Full Training Step (New!)

Analyze the time spent in data loading, forward, backward, and optimizer steps.

import torch
import torch.nn as nn
from torch.optim import Adam
from torch.utils.data import DataLoader, TensorDataset
from TrainSense import ModelProfiler, print_section

# --- Define Model, Device, Criterion, Optimizer, Loader ---
# (Assume model, device, criterion, optimizer, dummy_loader are defined as in Quick Start)
# ---

model_profiler = ModelProfiler(model, device=device)

print_section("Training Step Profiling")
print(f"Profiling training step on {device}...")

try:
    train_profile_results = model_profiler.profile_training_step(
        data_loader=dummy_loader, # Your DataLoader or data iterator
        criterion=criterion,      # Your loss function instance
        optimizer=optimizer,      # Your optimizer instance
        iterations=20,            # How many steps to average over
        warmup=5,                 # Warmup steps
        use_torch_profiler=True   # Enable detailed breakdown
    )

    if "error" in train_profile_results:
        print(f"!! Training Profiling Error: {train_profile_results['error']}")
    else:
        print("\n--- Basic Timing Breakdown (Avg ms per step) ---")
        total_t = train_profile_results.get('avg_step_time_ms', 0)
        print(f"- Total Step Time: {total_t:.2f} ms")
        # Print breakdown for data, forward, loss, backward, optimizer
        for key, name in [
            ('avg_data_load_time_ms', 'Data Loading'), ('avg_forward_time_ms', 'Forward Pass'),
            ('avg_loss_time_ms', 'Loss Calculation'), ('avg_backward_time_ms', 'Backward Pass'),
            ('avg_optimizer_time_ms', 'Optimizer Step')]:
            t = train_profile_results.get(key, 0)
            perc = (t / total_t * 100) if total_t > 0 else 0
            print(f"- {name}: {t:.2f} ms ({perc:.1f}%)")

        print("\n--- Memory (Training Step) ---")
        print(f"- Max Memory Allocated: {train_profile_results.get('max_memory_allocated_formatted', 'N/A')}")
        if device.type == 'cuda':
             print(f"- Max Memory Reserved: {train_profile_results.get('max_memory_reserved_formatted', 'N/A')}")

        # You can also access detailed torch.profiler results if enabled
        # if train_profile_results.get('use_torch_profiler'):
        #     print("\n--- Top Operators (Training) ---")
        #     print(train_profile_results.get('profiler_top_ops_summary', 'N/A'))

except Exception as e:
    print(f"!! Error during training profiling: {e}")

6. Analyzing Gradients (New!)

Check gradient health after a backward pass.

import torch
import torch.nn as nn
from torch.optim import Adam
from torch.utils.data import DataLoader, TensorDataset
from TrainSense import GradientAnalyzer, print_section

# --- Define Model, Device, Criterion, Optimizer, Loader ---
# (Assume model, device, criterion, optimizer, dummy_loader are defined as in Quick Start)
# ---

grad_analyzer = GradientAnalyzer(model) # Initialize with the model

print_section("Gradient Analysis")

try:
    # --- CRITICAL: Run a backward pass first! ---
    model.train()                             # Set model to train mode
    optimizer.zero_grad()                     # Clear previous gradients
    inputs, targets = next(iter(dummy_loader))# Get a batch
    outputs = model(inputs.to(device))        # Forward pass
    loss = criterion(outputs, targets.to(device)) # Calculate loss
    loss.backward()                           # Calculate gradients
    print(f"Performed one backward pass (Loss: {loss.item():.4f}). Analyzing gradients...")
    # -------------------------------------------

    # Analyze gradients (default L2 norm)
    # grad_stats_detailed = grad_analyzer.analyze_gradients() # Get per-parameter stats
    grad_summary = grad_analyzer.summary() # Get aggregated stats

    print("\n--- Gradient Summary ---")
    if "error" in grad_summary:
        print(f"Error: {grad_summary['error']}")
    else:
        print(f"- Num Params w/ Grads: {grad_summary.get('num_params_with_grads', 'N/A')}")
        print(f"- Global Grad Norm (L2): {grad_summary.get('global_grad_norm_L2', 'N/A'):.3e}") # Scientific notation
        print(f"- Avg/Max Grad Norm: {grad_summary.get('avg_grad_norm', 'N/A'):.3e} / {grad_summary.get('max_grad_norm', 'N/A'):.3e}")
        print(f"- Layer w/ Max Norm: {grad_summary.get('layer_with_max_grad_norm', 'N/A')}")
        print(f"- NaN Grads Found: {grad_summary.get('num_params_nan_grad', 0)}")
        print(f"- Inf Grads Found: {grad_summary.get('num_params_inf_grad', 0)}")
        # Ratio can be noisy, use with caution
        # print(f"- Avg Grad/Param Norm Ratio: {grad_summary.get('avg_grad_param_norm_ratio', 'N/A')}")

    # You can also iterate through grad_stats_detailed for per-layer info if needed

except StopIteration:
    print("!! Could not get data from loader to run backward pass for gradient analysis.")
except Exception as e:
    print(f"!! Error during gradient analysis: {e}")
finally:
    model.eval() # Set model back to eval mode

7. Monitoring GPU Status

Get real-time stats for your NVIDIA GPU(s). Requires GPUtil to be installed and functional.

from TrainSense import GPUMonitor, print_section

try:
    gpu_monitor = GPUMonitor()
    print_section("GPU Status")

    if gpu_monitor.is_available():
        status_list = gpu_monitor.get_gpu_status()
        if status_list:
            for gpu_status in status_list:
                 print(f"GPU ID: {gpu_status.get('id', 'N/A')}")
                 print(f"  Name: {gpu_status.get('name', 'N/A')}")
                 print(f"  Load: {gpu_status.get('load', 0):.1f}%")
                 print(f"  Memory Util: {gpu_status.get('memory_utilization_percent', 0):.1f}% ({gpu_status.get('memory_used_mb', 0):.0f}/{gpu_status.get('memory_total_mb', 0):.0f} MB)")
                 print(f"  Temperature: {gpu_status.get('temperature_celsius', 'N/A')} C")
                 print("-" * 10)
            # Get a summary across all GPUs
            summary = gpu_monitor.get_status_summary()
            if summary and summary['count'] > 1:
                 print("\nOverall GPU Summary:")
                 print(f"- Average Load: {summary.get('avg_load_percent', 0):.1f}%")
                 print(f"- Average Memory Util: {summary.get('avg_memory_utilization_percent', 0):.1f}%")
                 print(f"- Max Temperature: {summary.get('max_temperature_celsius', 'N/A')} C")
        else:
             print("- GPUtil is available, but no GPUs were detected or status could not be retrieved.")
    else:
         print("- GPUtil library not installed or failed to initialize.")

except Exception as e:
    print(f"An error occurred while monitoring GPUs: {e}")

8. Getting Optimizer and Scheduler Suggestions

Leverage heuristics based on model size and type.

import torch.nn as nn
from TrainSense import OptimizerHelper, ArchitectureAnalyzer, print_section

# --- Get Model Context ---
model = nn.LSTM(input_size=10, hidden_size=20, num_layers=2, batch_first=True) # Example RNN
arch_analyzer = ArchitectureAnalyzer(model)
arch_info = arch_analyzer.analyze()
model_params = arch_info.get('total_parameters', 0)
model_arch_type = arch_info.get('primary_architecture_type', 'Unknown')
model_layers = arch_info.get('layer_count', 0)
# -------------------------

print_section("Optimizer/Scheduler Suggestions")
print(f"Model Type: {model_arch_type}, Params: {model_params:,}, Layers: {model_layers}")

suggested_optimizer = OptimizerHelper.suggest_optimizer(model_params, model_layers, model_arch_type)
print(f"\nSuggested Optimizer: {suggested_optimizer}")

# Get the base name (e.g., "AdamW") for scheduler suggestion
base_optimizer_name = suggested_optimizer.split(" ")[0]
suggested_scheduler = OptimizerHelper.suggest_learning_rate_scheduler(base_optimizer_name)
print(f"Suggested Scheduler type for {base_optimizer_name}: {suggested_scheduler}")

suggested_initial_lr = OptimizerHelper.suggest_initial_learning_rate(model_arch_type, model_params)
print(f"Suggested Initial Learning Rate: {suggested_initial_lr:.1e}") # Format in scientific notation

9. Generating Heuristic Hyperparameters (`UltraOptimizer`)

Get a full starting set of parameters based on system, model, and basic data info.

from TrainSense import UltraOptimizer, SystemConfig, ArchitectureAnalyzer, print_section
import torch.nn as nn

# --- Get Context ---
model = nn.Sequential(nn.Linear(512, 1024), nn.ReLU(), nn.Linear(1024, 10)) # Moderate MLP
sys_config = SystemConfig()
config_summary = sys_config.get_summary() # UltraOptimizer uses the summary dict
arch_analyzer = ArchitectureAnalyzer(model)
arch_info = arch_analyzer.analyze()
# Provide some basic stats about your training data
data_stats = {"data_size": 150000, "num_classes": 10}
# ------------------

ultra_optimizer = UltraOptimizer(
    training_data_stats=data_stats,
    model_arch_stats=arch_info,
    system_config_summary=config_summary
)

print_section("Heuristic Parameter Set (UltraOptimizer)")
heuristic_result = ultra_optimizer.compute_heuristic_hyperparams()
params = heuristic_result.get("hyperparameters", {})
reasoning = heuristic_result.get("reasoning", {})

print("Generated Hyperparameters:")
for key, value in params.items():
    print(f"- {key}: {value}")

print("\nReasoning:")
for key, reason in reasoning.items():
    print(f"- {key}: {reason}")

10. Using the Logger

Configure logging to file and/or console.

import logging
import os
# Use the logger configured via TrainLogger or standard logging
# from TrainSense.logger import TrainLogger, get_trainsense_logger

# --- Example using standard logging setup (run this early) ---
log_dir = "my_training_logs"
if not os.path.exists(log_dir): os.makedirs(log_dir)
log_file_path = os.path.join(log_dir, "trainsense_run.log")
logging.basicConfig(
    level=logging.DEBUG, # Log DEBUG and higher messages
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
    handlers=[
        logging.FileHandler(log_file_path, mode='w'), # Log to file
        logging.StreamHandler()                     # Log to console
    ]
)
# Get a logger instance (can be any name)
logger = logging.getLogger("MyTrainingScript")
# -------------------------------------------------------------

# --- Example Logging Calls ---
logger.debug("This is a detailed debug message.")
logger.info("Starting data preprocessing.")
logger.warning("Learning rate seems high, consider lowering.")
try:
    x = 1 / 0
except ZeroDivisionError:
    # Log exception info automatically using exc_info=True
    logger.error("Calculation failed due to division by zero.", exc_info=True)

logger.info("Example finished.")

Interpreting the Output

Training Step Profiling:
- High % Data Loading: Your bottleneck is likely I/O or data preprocessing. Increase num_workers in DataLoader, optimize transforms, check disk speed, pre-fetch data.
- High % Backward Pass: Expected for complex models, but very high values might indicate inefficient layers or large activation memory. Consider activation checkpointing for very large models.
- High % Optimizer Step: Can happen with complex optimizers (like AdamW with many parameters) or if using techniques like gradient clipping extensively. Usually less of a bottleneck than backward or data loading.
Gradient Analysis:
- High Global Grad Norm / Max Grad Norm: Potential for exploding gradients. Consider gradient clipping (torch.nn.utils.clip_grad_norm_).
- Very Low Global Grad Norm / Avg Grad Norm (approaching zero): Potential for vanishing gradients, especially in deep networks or RNNs. Check initialization, consider different activation functions (ReLU variants), use normalization layers (BatchNorm, LayerNorm), or architectures like ResNets/LSTMs/GRUs.
- NaN/Inf Grads Found > 0: Serious problem! Training will likely diverge. Common causes: learning rate too high, numerical instability (e.g., log(0), division by zero), issues with mixed precision (amp), bad data. Reduce learning rate significantly, check data pipelines, enable anomaly detection (torch.autograd.set_detect_anomaly(True) - slows training!).
Other Common Patterns:
- High CPU Usage / Low GPU Utilization (General): Often points to data loading issues (see training profiler), but could also be excessive Python logic between GPU calls.
- High GPU Memory Usage (max_memory_allocated): Your model or batch size might be too large for the GPU VRAM. Consider reducing batch size, using gradient accumulation, mixed-precision training (torch.cuda.amp), or model optimization techniques (pruning, quantization).
- Inference Profiler Bottlenecks: Look at the profiler_top_ops_summary. If specific operations take disproportionate time, investigate optimization.

Contributing

Contributions are welcome! Please feel free to open an issue to discuss potential features or bug fixes, or submit a pull request. (Consider adding more specific guidelines later, e.g., code style, testing requirements).

License

This project is licensed under the MIT License. See the LICENSE file for details.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.5.1

Apr 11, 2025

0.4.0

Apr 7, 2025

0.3.0

Apr 7, 2025

This version

0.2.0

Apr 6, 2025

0.1.0

Apr 5, 2025

0.0.2

Apr 4, 2025

0.0.1

Apr 4, 2025

0.0.0

Apr 4, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

trainsense-0.2.0.tar.gz (46.6 kB view details)

Uploaded Apr 6, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

trainsense-0.2.0-py3-none-any.whl (41.5 kB view details)

Uploaded Apr 6, 2025 Python 3

File details

Details for the file trainsense-0.2.0.tar.gz.

File metadata

Download URL: trainsense-0.2.0.tar.gz
Upload date: Apr 6, 2025
Size: 46.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.4

File hashes

Hashes for trainsense-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`c0dff5fc2afe2f4b8c38a181f7b53ca7d40cb6d725b19547d75827e17c902c8b`
MD5	`edc4b19afb1758348f89dd669cd36369`
BLAKE2b-256	`f600eb34682f158bd9aaddf9f1a15b4f2240bc12635d00d43e6a20079f3a883e`

See more details on using hashes here.

File details

Details for the file trainsense-0.2.0-py3-none-any.whl.

File metadata

Download URL: trainsense-0.2.0-py3-none-any.whl
Upload date: Apr 6, 2025
Size: 41.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.4

File hashes

Hashes for trainsense-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2530854f14bce6c1dd46fdc1d988788ad2de99805132ee2517e56f94537ebba1`
MD5	`294bf99330a6d35f5fdeda1e9bdf6906`
BLAKE2b-256	`4171c74803f9b5e5a2913ddde37958d7ad0053189e2d9ae0d6bb4b65a4b21249`

See more details on using hashes here.

TrainSense 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

TrainSense: Analyze, Profile, and Optimize your PyTorch Training Workflow

Key Features

What's New (Recent Enhancements)

Installation

Core Concepts

Getting Started: Quick Example

Detailed Usage Examples

1. Checking System Configuration

2. Analyzing Your Model's Architecture

3. Getting Hyperparameter Recommendations

4. Profiling Model Inference Performance

5. Profiling a Full Training Step (New!)

6. Analyzing Gradients (New!)

7. Monitoring GPU Status

8. Getting Optimizer and Scheduler Suggestions

9. Generating Heuristic Hyperparameters (UltraOptimizer)

10. Using the Logger

Interpreting the Output

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

9. Generating Heuristic Hyperparameters (`UltraOptimizer`)