Skip to main content

A lightweight PyTorch model profiler for latency, parameters, memory, and layer-wise bottlenecks.

Project description

TinyTorchProfiler

TinyTorchProfiler is a lightweight Python library for profiling PyTorch models.

It helps users understand model performance beyond accuracy, including latency, parameter count, model size, activation memory, layer-wise runtime, and likely bottlenecks.

The goal of v0.1.0 is to stay small, readable, and practical for ML systems workflows where model performance matters as much as model quality.

Why This Library Matters

A model with good accuracy can still be too slow, too large, or too memory-heavy for production use.

TinyTorchProfiler helps answer practical deployment questions:

  • How many parameters does this model have?
  • How many parameters are trainable?
  • How large is the model in memory?
  • What is the average forward-pass latency?
  • Which layers are the slowest?
  • Which layers create the largest activations?
  • Does this model look suitable for deployment constraints?

Installation

From the project root, install the package in editable mode:

pip install -e .

For development, install the optional test dependencies:

pip install -e ".[dev]"

You can also install dependencies directly:

pip install -r requirements.txt

Quickstart

from torch import nn

from tinytorchprofiler import profile_model


model = nn.Sequential(
    nn.Linear(128, 64),
    nn.ReLU(),
    nn.Linear(64, 10),
)

report = profile_model(
    model,
    input_shape=(1, 128),
    device="cpu",
    warmup=5,
    runs=20,
)

report.summary()
report.to_csv("profile.csv")

Core API

from tinytorchprofiler import profile_model

report = profile_model(
    model,
    input_shape=(1, 3, 224, 224),
    device="cpu",
    warmup=5,
    runs=20,
)

report.summary()
report.to_dict()
report.to_csv("profile.csv")

Supported Metrics

TinyTorchProfiler v0.1.0 supports:

  • Total parameter count
  • Trainable parameter count
  • Estimated model size in MB
  • Average forward-pass latency in milliseconds
  • CPU profiling
  • CUDA profiling when available
  • Layer-wise names
  • Layer-wise module types
  • Layer-wise input and output shapes
  • Layer-wise parameter counts
  • Approximate activation size in MB
  • Approximate layer-wise forward latency

Device Support

CPU profiling is supported by default:

report = profile_model(model, input_shape=(1, 3, 224, 224), device="cpu")

CUDA profiling is supported when PyTorch detects an available CUDA device:

report = profile_model(model, input_shape=(1, 3, 224, 224), device="cuda")

For CUDA timing, TinyTorchProfiler synchronizes the device around measured regions with torch.cuda.synchronize().

Examples

Profile a small CNN:

python examples/01_profile_simple_cnn.py

This prints a summary and writes:

simple_cnn_profile.csv

Profile a torchvision model if torchvision is installed:

python examples/02_profile_torchvision_model.py

If torchvision is missing, the example exits gracefully with a short message.

Testing

Run the test suite with:

pytest

The v0.1.0 tests cover:

  • Parameter counting
  • Trainable parameter counting
  • Model size estimation
  • Tensor size estimation
  • Report serialization with to_dict()
  • Basic end-to-end profiling on a tiny model

Notes on Profiling

TinyTorchProfiler uses PyTorch forward passes and hooks.

Layer-wise latency is approximate. It is useful for identifying likely bottlenecks, but exact timing can vary depending on hardware, backend libraries, CPU load, CUDA synchronization, and model structure.

Activation memory is estimated from forward outputs. It should be treated as a helpful approximation, not as a replacement for full memory tracing.

Roadmap

Planned future improvements:

  • Batch-size scaling analysis
  • Memory peak tracking
  • Visualization
  • ONNX export profiling
  • ViT and DINOv2 examples

Version

Current version: 0.1.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tinytorchprofiler-0.1.0.tar.gz (8.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tinytorchprofiler-0.1.0-py3-none-any.whl (7.9 kB view details)

Uploaded Python 3

File details

Details for the file tinytorchprofiler-0.1.0.tar.gz.

File metadata

  • Download URL: tinytorchprofiler-0.1.0.tar.gz
  • Upload date:
  • Size: 8.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for tinytorchprofiler-0.1.0.tar.gz
Algorithm Hash digest
SHA256 3ca6995e3df8fcdfe9805634b9a85ae4bd69712d76086a488801cfa022d541ef
MD5 ccbddce1422269f102b0ecc135f45368
BLAKE2b-256 54c831f1b88ddad3f40d6780dd39e21160d09db202783c0c32509f22ab6f7395

See more details on using hashes here.

File details

Details for the file tinytorchprofiler-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for tinytorchprofiler-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0e830e68c27b1a8d4f170cff029e573288d7e18479b8fd535e6f635453431ff5
MD5 8d24310235c0fdf3a95fdfa8e37e9231
BLAKE2b-256 49014645096b8a5bf2f2f046502b35cd20d037c531a5af8988f8f6fcb3c8466a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page