A lightweight PyTorch model profiler for latency, parameters, memory, and layer-wise bottlenecks.

These details have not been verified by PyPI

Project description

TinyTorchProfiler

TinyTorchProfiler is a lightweight Python library for profiling PyTorch models.

It helps users understand model performance beyond accuracy, including latency, parameter count, model size, activation memory, layer-wise runtime, and likely bottlenecks.

The goal of v0.1.0 is to stay small, readable, and practical for ML systems workflows where model performance matters as much as model quality.

Why This Library Matters

A model with good accuracy can still be too slow, too large, or too memory-heavy for production use.

TinyTorchProfiler helps answer practical deployment questions:

How many parameters does this model have?
How many parameters are trainable?
How large is the model in memory?
What is the average forward-pass latency?
Which layers are the slowest?
Which layers create the largest activations?
Does this model look suitable for deployment constraints?

Installation

From the project root, install the package in editable mode:

pip install -e .

For development, install the optional test dependencies:

pip install -e ".[dev]"

You can also install dependencies directly:

pip install -r requirements.txt

Quickstart

from torch import nn

from tinytorchprofiler import profile_model


model = nn.Sequential(
    nn.Linear(128, 64),
    nn.ReLU(),
    nn.Linear(64, 10),
)

report = profile_model(
    model,
    input_shape=(1, 128),
    device="cpu",
    warmup=5,
    runs=20,
)

report.summary()
report.to_csv("profile.csv")

Show package metadata:

import tinytorchprofiler as tinyprofiler

tinyprofiler.show()

Check whether the model fits a deployment budget:

budget = report.check_budget(
    max_latency_ms=30,
    max_model_size_mb=25,
    max_parameters=5_000_000,
)

print(budget["passed"])
print(budget["checks"])

Score the model for a built-in deployment target:

score = report.deployment_score("edge_cpu")

print(score["score"])
print(score["bottlenecks"])

Use strict custom deployment budgets:

score = report.deployment_score(
    target="custom",
    max_latency_ms=0.1,
    max_model_size_mb=0.001,
)

print(score["passed"])
print(score["score"])

Core API

from tinytorchprofiler import profile_model

report = profile_model(
    model,
    input_shape=(1, 3, 224, 224),
    device="cpu",
    warmup=5,
    runs=20,
)

report.summary()
report.to_dict()
report.to_csv("profile.csv")

Supported Metrics

TinyTorchProfiler v0.1.2 supports:

Total parameter count
Trainable parameter count
Estimated model size in MB
Average forward-pass latency in milliseconds
CPU profiling
CUDA profiling when available
Layer-wise names
Layer-wise module types
Layer-wise input and output shapes
Layer-wise parameter counts
Approximate activation size in MB
Approximate layer-wise forward latency
Deployment budget checks
Deployment readiness score for built-in and custom targets

Deployment Readiness

TinyTorchProfiler can check model metrics against production-style deployment budgets:

result = report.check_budget(
    max_latency_ms=30,
    max_model_size_mb=25,
    max_parameters=5_000_000,
    max_activation_size_mb=64,
)

It can also calculate a simple 0-100 deployment readiness score:

score = report.deployment_score("edge_cpu")

You can override any built-in target budget:

score = report.deployment_score(
    "edge_cpu",
    max_latency_ms=10,
)

You can also use a fully custom target:

score = report.deployment_score(
    target="custom",
    max_latency_ms=0.1,
    max_model_size_mb=0.001,
    max_parameters=1_000,
    max_activation_size_mb=1,
)

Built-in targets:

edge_cpu
mobile
server_cpu
realtime_webcam
custom

The score is a lightweight heuristic based on latency, model size, parameter count, and activation memory. It is intended as a fast first-pass signal, not a replacement for production benchmarking on real hardware.

Device Support

CPU profiling is supported by default:

report = profile_model(model, input_shape=(1, 3, 224, 224), device="cpu")

CUDA profiling is supported when PyTorch detects an available CUDA device:

report = profile_model(model, input_shape=(1, 3, 224, 224), device="cuda")

For CUDA timing, TinyTorchProfiler synchronizes the device around measured regions with torch.cuda.synchronize().

Examples

Profile a small CNN:

python examples/01_profile_simple_cnn.py

This prints a summary and writes:

simple_cnn_profile.csv

Profile a torchvision model if torchvision is installed:

python examples/02_profile_torchvision_model.py

If torchvision is missing, the example exits gracefully with a short message.

Testing

Run the test suite with:

pytest

The v0.1.0 tests cover:

Parameter counting
Trainable parameter counting
Model size estimation
Tensor size estimation
Report serialization with to_dict()
Basic end-to-end profiling on a tiny model

Notes on Profiling

TinyTorchProfiler uses PyTorch forward passes and hooks.

Layer-wise latency is approximate. It is useful for identifying likely bottlenecks, but exact timing can vary depending on hardware, backend libraries, CPU load, CUDA synchronization, and model structure.

Activation memory is estimated from forward outputs. It should be treated as a helpful approximation, not as a replacement for full memory tracing.

Roadmap

Planned future improvements:

Batch-size scaling analysis
Memory peak tracking
Visualization
ONNX export profiling
ViT and DINOv2 examples

Version

Current version: 0.1.2

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.2

May 20, 2026

0.1.0

May 20, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tinytorchprofiler-0.1.2.tar.gz (11.6 kB view details)

Uploaded May 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tinytorchprofiler-0.1.2-py3-none-any.whl (10.3 kB view details)

Uploaded May 20, 2026 Python 3

File details

Details for the file tinytorchprofiler-0.1.2.tar.gz.

File metadata

Download URL: tinytorchprofiler-0.1.2.tar.gz
Upload date: May 20, 2026
Size: 11.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for tinytorchprofiler-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`dc63486d1cd550e139f215786814dd7dbe6aa2a2edb3f5263b3318d9db24b1f9`
MD5	`c4180feb1bd4f9db00ef7a3f17b8fd8d`
BLAKE2b-256	`2bb560f16a50521c0935a14ff4b381bd59fbb3d06b20fee87a0e5fda06bbf1e6`

See more details on using hashes here.

File details

Details for the file tinytorchprofiler-0.1.2-py3-none-any.whl.

File metadata

Download URL: tinytorchprofiler-0.1.2-py3-none-any.whl
Upload date: May 20, 2026
Size: 10.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for tinytorchprofiler-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f41a8fe228d7415a4aec4457195e074373db61e83aa0fc36623249d0dabade58`
MD5	`5cbe31b40258846680b2cba11becbabd`
BLAKE2b-256	`27c7a3b850df9db5d05a34ab4e4195f162c0d6a5ef94e034ae32e4b33a3efcf8`

See more details on using hashes here.

tinytorchprofiler 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

TinyTorchProfiler

Why This Library Matters

Installation

Quickstart

Core API

Supported Metrics

Deployment Readiness

Device Support

Examples

Testing

Notes on Profiling

Roadmap

Version

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes