Skip to main content

A comprehensive GPU memory profiler for PyTorch and TensorFlow with CLI, visualization, and analytics

Project description

Stormlog

CI PyPI Version License: MIT Python 3.10+ PyTorch TensorFlow Textual TUI

Stormlog overview tab
Current Overview tab from the shipped Textual interface.

Stormlog is a memory-profiling toolkit for day-to-day PyTorch and TensorFlow work. It combines Python APIs, CLI commands, and a Textual TUI so you can move from "what is using memory?" to saved artifacts and shareable diagnostics without switching tools.

Long-running track and diagnose flows are session-aware, so one capture can be reconstructed deterministically across sink segments, diagnose bundles, and OOM flight-recorder artifacts.

For task-oriented operational guidance, use the Production Cookbook. The highest-signal entry points are the Always-on Tracking recipe, Incident Playbooks, and CI and Release Qualification.

Why use this tool

  • Catch memory growth before it becomes an OOM.
  • Compare allocated vs reserved usage during training and inference.
  • Export telemetry and diagnose bundles for CI or release triage.
  • Load the same artifacts into a terminal UI for faster debugging.
  • Keep workflows available on CPU-only and MPS systems, not just CUDA boxes.

Installation

From PyPI

pip install stormlog
pip install "stormlog[viz]"
pip install "stormlog[tui,torch]"
pip install "stormlog[torch]"
pip install "stormlog[tf]"
pip install "stormlog[all]"

stormlog[all] installs every runtime extra: visualization, TUI, PyTorch, and TensorFlow dependencies.

Package and import names

stormlog is the distribution name on PyPI and the primary Python import root. TensorFlow-specific APIs live under stormlog.tensorflow.

Task Use
Install the package pip install stormlog
Launch the TUI stormlog
Import PyTorch APIs from stormlog import GPUMemoryProfiler, MemoryTracker
Import TensorFlow APIs from stormlog.tensorflow import TFMemoryProfiler
Run CLI automation gpumemprof or tfmemprof

From source

git clone https://github.com/Silas-Asamoah/stormlog.git
cd stormlog
pip install -e .
pip install -e ".[viz,tui,torch]"

If you want both framework extras in a development checkout:

pip install -e ".[dev,test,all]"

The examples/ package and Markdown test guides are source-checkout only. A plain pip install stormlog does not include them.

Quick start

CLI-first workflow

This is the fastest path to verify an environment and produce an artifact you can inspect later:

gpumemprof info
gpumemprof track --duration 2 --interval 0.5 --output /tmp/gpumemprof_track.json --format json
gpumemprof analyze /tmp/gpumemprof_track.json --format txt --output /tmp/gpumemprof_analysis.txt
gpumemprof diagnose --duration 0 --output /tmp/gpumemprof_diag

tfmemprof info
tfmemprof diagnose --duration 0 --output /tmp/tf_diag

If you reuse a sink directory across multiple runs, Stormlog separates those captures by session_id and defaults analysis to the newest clean session.

PyTorch API workflow

GPUMemoryProfiler currently targets PyTorch runtimes exposed through torch.cuda, which covers NVIDIA CUDA and ROCm-backed builds. On Apple MPS or CPU-only systems, use MemoryTracker, the CLI, or CPUMemoryProfiler instead.

import torch
from stormlog import GPUMemoryProfiler

profiler = GPUMemoryProfiler()
device = profiler.device
model = torch.nn.Linear(1024, 128).to(device)

def train_step() -> torch.Tensor:
    x = torch.randn(64, 1024, device=device)
    y = model(x)
    return y.sum()

profile = profiler.profile_function(train_step)
summary = profiler.get_summary()

print(profile.function_name)
print(f"Peak memory: {summary['peak_memory_usage'] / (1024**3):.2f} GB")

TensorFlow API workflow

TFMemoryProfiler works on GPU or CPU-backed TensorFlow runtimes.

from stormlog.tensorflow import TFMemoryProfiler

profiler = TFMemoryProfiler(enable_tensor_tracking=True)

with profiler.profile_context("training"):
    model.fit(x_train, y_train, epochs=1, batch_size=32)

results = profiler.get_results()
print(f"Peak memory: {results.peak_memory_mb:.2f} MB")
print(f"Snapshots captured: {len(results.snapshots)}")

Daily workflows

ML engineer

  • instrument a training step with GPUMemoryProfiler or TFMemoryProfiler
  • switch to track when you need telemetry over time
  • export plots or analyze saved telemetry later

Researcher debugging regressions

  • capture track output or a diagnose bundle
  • open the same artifacts in the TUI diagnostics and visualizations tabs
  • compare growth, gaps, and per-rank behavior before changing model code
  • switch between discovered sessions instead of merging multiple captures from the same sink

CI or release owner

These example-module commands require a source checkout plus pip install -e ..

  • run python -m examples.cli.quickstart
  • run python -m examples.cli.capability_matrix --mode smoke --target both --oom-mode simulated
  • archive the emitted artifacts for later triage in the TUI

Terminal UI

Install the optional TUI dependencies and launch:

pip install "stormlog[tui,torch]"
stormlog

The current TUI startup path imports PyTorch immediately, so stormlog[tui] alone is not enough yet.

The current TUI tabs are:

  • Overview
  • PyTorch
  • TensorFlow
  • Monitoring
  • Visualizations
  • Diagnostics
  • CLI & Actions

Stormlog overview tab
Overview tab with current system summary and navigation guidance.

Stormlog diagnostics tab
Diagnostics tab with the current artifact loader, rank table, and timeline panes.

Use the Monitoring tab to start live tracking, export CSV or JSON events to ./exports, and tune warning or critical thresholds. In Visualizations, refresh the live timeline and save PNG or HTML exports under ./visualizations. In Diagnostics, load live telemetry or artifact paths and rebuild rank-level diagnostics without leaving the terminal.

For screen-by-screen details, see the TUI Guide.

Examples and walkthroughs

Launch QA scenarios

These commands require a source checkout:

python -m examples.cli.quickstart
python -m examples.cli.capability_matrix --mode smoke --target both --oom-mode simulated
python -m examples.scenarios.cpu_telemetry_scenario
python -m examples.scenarios.oom_flight_recorder_scenario --mode simulated

CPU-only and laptop workflows

If CUDA is not available, Stormlog still supports:

  • gpumemprof info
  • gpumemprof monitor
  • gpumemprof track
  • CPUMemoryProfiler
  • CPUMemoryTracker
  • the TUI overview, monitoring, diagnostics, and CLI tabs

See the CPU Compatibility Guide for the CPU-only path.

Contributing

See the Contributing Guide and Code of Conduct.

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stormlog-0.3.0.tar.gz (6.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

stormlog-0.3.0-py3-none-any.whl (192.9 kB view details)

Uploaded Python 3

File details

Details for the file stormlog-0.3.0.tar.gz.

File metadata

  • Download URL: stormlog-0.3.0.tar.gz
  • Upload date:
  • Size: 6.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for stormlog-0.3.0.tar.gz
Algorithm Hash digest
SHA256 dc64b9ed5accef557e61b3ea24eff36223c79aa52eae9ece63976c6f2ec2abd0
MD5 566b56f4cf4659438fe54e3cc37d0af6
BLAKE2b-256 3b529844378686ae0095ffa17152cb994704a8ba21cc21a9a85b227f420a993a

See more details on using hashes here.

File details

Details for the file stormlog-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: stormlog-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 192.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for stormlog-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 755d9717bf6d375c18f9bb684b4cbdb81ade4df112c6211b281c235bd14e35a8
MD5 63c87d210208524fee7b903f65b97827
BLAKE2b-256 22ea031a023495aaed0a0834760b7dfe424e272c78140f6f78532f3d98bf99dc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page