Skip to main content

A comprehensive GPU memory profiler for JAX, PyTorch and TensorFlow with CLI, visualization, and analytics

Project description

Stormlog

CI PyPI Version License: MIT Python 3.10+ PyTorch TensorFlow Textual TUI

Stormlog overview tab
Current Overview tab from the shipped Textual interface.

Stormlog is a memory-profiling toolkit for day-to-day PyTorch and TensorFlow work. It combines Python APIs, CLI commands, and a Textual TUI so you can move from "what is using memory?" to saved artifacts and shareable diagnostics without switching tools.

Long-running track and diagnose flows are session-aware, so one capture can be reconstructed deterministically across sink segments, diagnose bundles, and OOM flight-recorder artifacts.

For task-oriented operational guidance, use the Production Cookbook. The highest-signal entry points are the Always-on Tracking recipe, Incident Playbooks, and CI and Release Qualification.

Why use this tool

  • Catch memory growth before it becomes an OOM.
  • Compare allocated vs reserved usage during training and inference.
  • Export telemetry and diagnose bundles for CI or release triage.
  • Load the same artifacts into a terminal UI for faster debugging.
  • Keep workflows available on CPU-only and MPS systems, not just CUDA boxes.

Installation

From PyPI

pip install stormlog
pip install "stormlog[viz]"
pip install "stormlog[tui,torch]"
pip install "stormlog[torch]"
pip install "stormlog[tf]"
pip install "stormlog[all]"

stormlog[all] installs every runtime extra: visualization, TUI, PyTorch, and TensorFlow dependencies.

Package and import names

stormlog is the distribution name on PyPI and the primary Python import root. TensorFlow-specific APIs live under stormlog.tensorflow.

Task Use
Install the package pip install stormlog
Launch the TUI stormlog
Import PyTorch APIs from stormlog import GPUMemoryProfiler, MemoryTracker
Import TensorFlow APIs from stormlog.tensorflow import TFMemoryProfiler
Run CLI automation gpumemprof or tfmemprof

From source

git clone https://github.com/Silas-Asamoah/stormlog.git
cd stormlog
pip install -e .
pip install -e ".[viz,tui,torch]"

If you want both framework extras in a development checkout:

pip install -e ".[dev,test,all]"

The examples/ package and Markdown test guides are source-checkout only. A plain pip install stormlog does not include them.

Quick start

CLI-first workflow

This is the fastest path to verify an environment and produce an artifact you can inspect later:

gpumemprof info
gpumemprof track --duration 2 --interval 0.5 --output /tmp/gpumemprof_track.json --format json
gpumemprof analyze /tmp/gpumemprof_track.json --format txt --output /tmp/gpumemprof_analysis.txt
gpumemprof diagnose --duration 0 --output /tmp/gpumemprof_diag

tfmemprof info
tfmemprof diagnose --duration 0 --output /tmp/tf_diag

If you reuse a sink directory across multiple runs, Stormlog separates those captures by session_id and defaults analysis to the newest clean session.

PyTorch API workflow

GPUMemoryProfiler currently targets PyTorch runtimes exposed through torch.cuda, which covers NVIDIA CUDA and ROCm-backed builds. On Apple MPS or CPU-only systems, use MemoryTracker, the CLI, or CPUMemoryProfiler instead.

import torch
from stormlog import GPUMemoryProfiler

profiler = GPUMemoryProfiler()
device = profiler.device
model = torch.nn.Linear(1024, 128).to(device)

def train_step() -> torch.Tensor:
    x = torch.randn(64, 1024, device=device)
    y = model(x)
    return y.sum()

profile = profiler.profile_function(train_step)
summary = profiler.get_summary()

print(profile.function_name)
print(f"Peak memory: {summary['peak_memory_usage'] / (1024**3):.2f} GB")

TensorFlow API workflow

TFMemoryProfiler works on GPU or CPU-backed TensorFlow runtimes.

from stormlog.tensorflow import TFMemoryProfiler

profiler = TFMemoryProfiler(enable_tensor_tracking=True)

with profiler.profile_context("training"):
    model.fit(x_train, y_train, epochs=1, batch_size=32)

results = profiler.get_results()
print(f"Peak memory: {results.peak_memory_mb:.2f} MB")
print(f"Snapshots captured: {len(results.snapshots)}")

Daily workflows

ML engineer

  • instrument a training step with GPUMemoryProfiler or TFMemoryProfiler
  • switch to track when you need telemetry over time
  • export plots or analyze saved telemetry later

Researcher debugging regressions

  • capture track output or a diagnose bundle
  • open the same artifacts in the TUI diagnostics and visualizations tabs
  • compare growth, gaps, and per-rank behavior before changing model code
  • switch between discovered sessions instead of merging multiple captures from the same sink

CI or release owner

These example-module commands require a source checkout plus pip install -e ..

  • run python -m examples.cli.quickstart
  • run python -m examples.cli.capability_matrix --mode smoke --target both --oom-mode simulated
  • archive the emitted artifacts for later triage in the TUI

Terminal UI

Install the optional TUI dependencies and launch:

pip install "stormlog[tui,torch]"
stormlog

The current TUI startup path imports PyTorch immediately, so stormlog[tui] alone is not enough yet.

The current TUI tabs are:

  • Overview
  • PyTorch
  • TensorFlow
  • Monitoring
  • Visualizations
  • Diagnostics
  • CLI & Actions

Stormlog overview tab
Overview tab with current system summary and navigation guidance.

Stormlog diagnostics tab
Diagnostics tab with the current artifact loader, rank table, and timeline panes.

Use the Monitoring tab to start live tracking, export CSV or JSON events to ./exports, and tune warning or critical thresholds. In Visualizations, refresh the live timeline and save PNG or HTML exports under ./visualizations. In Diagnostics, load live telemetry or artifact paths and rebuild rank-level diagnostics without leaving the terminal.

For screen-by-screen details, see the TUI Guide.

Examples and walkthroughs

Launch QA scenarios

These commands require a source checkout:

python -m examples.cli.quickstart
python -m examples.cli.capability_matrix --mode smoke --target both --oom-mode simulated
python -m examples.scenarios.cpu_telemetry_scenario
python -m examples.scenarios.oom_flight_recorder_scenario --mode simulated

CPU-only and laptop workflows

If CUDA is not available, Stormlog still supports:

  • gpumemprof info
  • gpumemprof monitor
  • gpumemprof track
  • CPUMemoryProfiler
  • CPUMemoryTracker
  • the TUI overview, monitoring, diagnostics, and CLI tabs

See the CPU Compatibility Guide for the CPU-only path.

Contributing

See the Contributing Guide and Code of Conduct.

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stormlog-0.3.5.tar.gz (6.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

stormlog-0.3.5-py3-none-any.whl (297.8 kB view details)

Uploaded Python 3

File details

Details for the file stormlog-0.3.5.tar.gz.

File metadata

  • Download URL: stormlog-0.3.5.tar.gz
  • Upload date:
  • Size: 6.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for stormlog-0.3.5.tar.gz
Algorithm Hash digest
SHA256 f1d4d14706ebadfda5d5e5816ebda13ec2cea7e7c7a68a15fa6af7e10ded4a3f
MD5 35ce5eba2bdca16e10bbec75e840a068
BLAKE2b-256 abef75dabfdf43e30ec85a610b84b3dc56f8b053764c4c3f75a748f1626b5a6b

See more details on using hashes here.

Provenance

The following attestation bundles were made for stormlog-0.3.5.tar.gz:

Publisher: release.yml on Silas-Asamoah/stormlog

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file stormlog-0.3.5-py3-none-any.whl.

File metadata

  • Download URL: stormlog-0.3.5-py3-none-any.whl
  • Upload date:
  • Size: 297.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for stormlog-0.3.5-py3-none-any.whl
Algorithm Hash digest
SHA256 55cdffcef7190b59eb472cc7bd2f51e09057322d3fbb86f51de5badeb61bd110
MD5 b12afd739cb0da41ec9be7947ed9182b
BLAKE2b-256 ecbde4901e67485672f819cab91025e910034cb9cb45d965beeda150055c6da4

See more details on using hashes here.

Provenance

The following attestation bundles were made for stormlog-0.3.5-py3-none-any.whl:

Publisher: release.yml on Silas-Asamoah/stormlog

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page