Skip to main content

A lightweight PyTorch model profiler for latency, parameters, memory, and layer-wise bottlenecks.

Project description

TinyTorchProfiler

TinyTorchProfiler is a lightweight Python library for profiling PyTorch models.

It helps users understand model performance beyond accuracy, including latency, parameter count, model size, activation memory, layer-wise runtime, and likely bottlenecks.

The goal of v0.1.0 is to stay small, readable, and practical for ML systems workflows where model performance matters as much as model quality.

Why This Library Matters

A model with good accuracy can still be too slow, too large, or too memory-heavy for production use.

TinyTorchProfiler helps answer practical deployment questions:

  • How many parameters does this model have?
  • How many parameters are trainable?
  • How large is the model in memory?
  • What is the average forward-pass latency?
  • Which layers are the slowest?
  • Which layers create the largest activations?
  • Does this model look suitable for deployment constraints?

Installation

From the project root, install the package in editable mode:

pip install -e .

For development, install the optional test dependencies:

pip install -e ".[dev]"

You can also install dependencies directly:

pip install -r requirements.txt

Quickstart

from torch import nn

from tinytorchprofiler import profile_model


model = nn.Sequential(
    nn.Linear(128, 64),
    nn.ReLU(),
    nn.Linear(64, 10),
)

report = profile_model(
    model,
    input_shape=(1, 128),
    device="cpu",
    warmup=5,
    runs=20,
)

report.summary()
report.to_csv("profile.csv")

Show package metadata:

import tinytorchprofiler as tinyprofiler

tinyprofiler.show()

Check whether the model fits a deployment budget:

budget = report.check_budget(
    max_latency_ms=30,
    max_model_size_mb=25,
    max_parameters=5_000_000,
)

print(budget["passed"])
print(budget["checks"])

Score the model for a built-in deployment target:

score = report.deployment_score("edge_cpu")

print(score["score"])
print(score["bottlenecks"])

Use strict custom deployment budgets:

score = report.deployment_score(
    target="custom",
    max_latency_ms=0.1,
    max_model_size_mb=0.001,
)

print(score["passed"])
print(score["score"])

Core API

from tinytorchprofiler import profile_model

report = profile_model(
    model,
    input_shape=(1, 3, 224, 224),
    device="cpu",
    warmup=5,
    runs=20,
)

report.summary()
report.to_dict()
report.to_csv("profile.csv")

Supported Metrics

TinyTorchProfiler v0.1.2 supports:

  • Total parameter count
  • Trainable parameter count
  • Estimated model size in MB
  • Average forward-pass latency in milliseconds
  • CPU profiling
  • CUDA profiling when available
  • Layer-wise names
  • Layer-wise module types
  • Layer-wise input and output shapes
  • Layer-wise parameter counts
  • Approximate activation size in MB
  • Approximate layer-wise forward latency
  • Deployment budget checks
  • Deployment readiness score for built-in and custom targets

Deployment Readiness

TinyTorchProfiler can check model metrics against production-style deployment budgets:

result = report.check_budget(
    max_latency_ms=30,
    max_model_size_mb=25,
    max_parameters=5_000_000,
    max_activation_size_mb=64,
)

It can also calculate a simple 0-100 deployment readiness score:

score = report.deployment_score("edge_cpu")

You can override any built-in target budget:

score = report.deployment_score(
    "edge_cpu",
    max_latency_ms=10,
)

You can also use a fully custom target:

score = report.deployment_score(
    target="custom",
    max_latency_ms=0.1,
    max_model_size_mb=0.001,
    max_parameters=1_000,
    max_activation_size_mb=1,
)

Built-in targets:

  • edge_cpu
  • mobile
  • server_cpu
  • realtime_webcam
  • custom

The score is a lightweight heuristic based on latency, model size, parameter count, and activation memory. It is intended as a fast first-pass signal, not a replacement for production benchmarking on real hardware.

Device Support

CPU profiling is supported by default:

report = profile_model(model, input_shape=(1, 3, 224, 224), device="cpu")

CUDA profiling is supported when PyTorch detects an available CUDA device:

report = profile_model(model, input_shape=(1, 3, 224, 224), device="cuda")

For CUDA timing, TinyTorchProfiler synchronizes the device around measured regions with torch.cuda.synchronize().

Examples

Profile a small CNN:

python examples/01_profile_simple_cnn.py

This prints a summary and writes:

simple_cnn_profile.csv

Profile a torchvision model if torchvision is installed:

python examples/02_profile_torchvision_model.py

If torchvision is missing, the example exits gracefully with a short message.

Testing

Run the test suite with:

pytest

The v0.1.0 tests cover:

  • Parameter counting
  • Trainable parameter counting
  • Model size estimation
  • Tensor size estimation
  • Report serialization with to_dict()
  • Basic end-to-end profiling on a tiny model

Notes on Profiling

TinyTorchProfiler uses PyTorch forward passes and hooks.

Layer-wise latency is approximate. It is useful for identifying likely bottlenecks, but exact timing can vary depending on hardware, backend libraries, CPU load, CUDA synchronization, and model structure.

Activation memory is estimated from forward outputs. It should be treated as a helpful approximation, not as a replacement for full memory tracing.

Roadmap

Planned future improvements:

  • Batch-size scaling analysis
  • Memory peak tracking
  • Visualization
  • ONNX export profiling
  • ViT and DINOv2 examples

Version

Current version: 0.1.2

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tinytorchprofiler-0.1.2.tar.gz (11.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tinytorchprofiler-0.1.2-py3-none-any.whl (10.3 kB view details)

Uploaded Python 3

File details

Details for the file tinytorchprofiler-0.1.2.tar.gz.

File metadata

  • Download URL: tinytorchprofiler-0.1.2.tar.gz
  • Upload date:
  • Size: 11.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for tinytorchprofiler-0.1.2.tar.gz
Algorithm Hash digest
SHA256 dc63486d1cd550e139f215786814dd7dbe6aa2a2edb3f5263b3318d9db24b1f9
MD5 c4180feb1bd4f9db00ef7a3f17b8fd8d
BLAKE2b-256 2bb560f16a50521c0935a14ff4b381bd59fbb3d06b20fee87a0e5fda06bbf1e6

See more details on using hashes here.

File details

Details for the file tinytorchprofiler-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for tinytorchprofiler-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 f41a8fe228d7415a4aec4457195e074373db61e83aa0fc36623249d0dabade58
MD5 5cbe31b40258846680b2cba11becbabd
BLAKE2b-256 27c7a3b850df9db5d05a34ab4e4195f162c0d6a5ef94e034ae32e4b33a3efcf8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page