A lightweight PyTorch model profiler for latency, parameters, memory, and layer-wise bottlenecks.
Project description
TinyTorchProfiler
TinyTorchProfiler is a lightweight Python library for profiling PyTorch models.
It helps users understand model performance beyond accuracy, including latency, parameter count, model size, activation memory, layer-wise runtime, and likely bottlenecks.
The goal of v0.1.0 is to stay small, readable, and practical for ML systems workflows where model performance matters as much as model quality.
Why This Library Matters
A model with good accuracy can still be too slow, too large, or too memory-heavy for production use.
TinyTorchProfiler helps answer practical deployment questions:
- How many parameters does this model have?
- How many parameters are trainable?
- How large is the model in memory?
- What is the average forward-pass latency?
- Which layers are the slowest?
- Which layers create the largest activations?
- Does this model look suitable for deployment constraints?
Installation
From the project root, install the package in editable mode:
pip install -e .
For development, install the optional test dependencies:
pip install -e ".[dev]"
You can also install dependencies directly:
pip install -r requirements.txt
Quickstart
from torch import nn
from tinytorchprofiler import profile_model
model = nn.Sequential(
nn.Linear(128, 64),
nn.ReLU(),
nn.Linear(64, 10),
)
report = profile_model(
model,
input_shape=(1, 128),
device="cpu",
warmup=5,
runs=20,
)
report.summary()
report.to_csv("profile.csv")
Show package metadata:
import tinytorchprofiler as tinyprofiler
tinyprofiler.show()
Check whether the model fits a deployment budget:
budget = report.check_budget(
max_latency_ms=30,
max_model_size_mb=25,
max_parameters=5_000_000,
)
print(budget["passed"])
print(budget["checks"])
Score the model for a built-in deployment target:
score = report.deployment_score("edge_cpu")
print(score["score"])
print(score["bottlenecks"])
Use strict custom deployment budgets:
score = report.deployment_score(
target="custom",
max_latency_ms=0.1,
max_model_size_mb=0.001,
)
print(score["passed"])
print(score["score"])
Core API
from tinytorchprofiler import profile_model
report = profile_model(
model,
input_shape=(1, 3, 224, 224),
device="cpu",
warmup=5,
runs=20,
)
report.summary()
report.to_dict()
report.to_csv("profile.csv")
Supported Metrics
TinyTorchProfiler v0.1.2 supports:
- Total parameter count
- Trainable parameter count
- Estimated model size in MB
- Average forward-pass latency in milliseconds
- CPU profiling
- CUDA profiling when available
- Layer-wise names
- Layer-wise module types
- Layer-wise input and output shapes
- Layer-wise parameter counts
- Approximate activation size in MB
- Approximate layer-wise forward latency
- Deployment budget checks
- Deployment readiness score for built-in and custom targets
Deployment Readiness
TinyTorchProfiler can check model metrics against production-style deployment budgets:
result = report.check_budget(
max_latency_ms=30,
max_model_size_mb=25,
max_parameters=5_000_000,
max_activation_size_mb=64,
)
It can also calculate a simple 0-100 deployment readiness score:
score = report.deployment_score("edge_cpu")
You can override any built-in target budget:
score = report.deployment_score(
"edge_cpu",
max_latency_ms=10,
)
You can also use a fully custom target:
score = report.deployment_score(
target="custom",
max_latency_ms=0.1,
max_model_size_mb=0.001,
max_parameters=1_000,
max_activation_size_mb=1,
)
Built-in targets:
edge_cpumobileserver_cpurealtime_webcamcustom
The score is a lightweight heuristic based on latency, model size, parameter count, and activation memory. It is intended as a fast first-pass signal, not a replacement for production benchmarking on real hardware.
Device Support
CPU profiling is supported by default:
report = profile_model(model, input_shape=(1, 3, 224, 224), device="cpu")
CUDA profiling is supported when PyTorch detects an available CUDA device:
report = profile_model(model, input_shape=(1, 3, 224, 224), device="cuda")
For CUDA timing, TinyTorchProfiler synchronizes the device around measured
regions with torch.cuda.synchronize().
Examples
Profile a small CNN:
python examples/01_profile_simple_cnn.py
This prints a summary and writes:
simple_cnn_profile.csv
Profile a torchvision model if torchvision is installed:
python examples/02_profile_torchvision_model.py
If torchvision is missing, the example exits gracefully with a short message.
Testing
Run the test suite with:
pytest
The v0.1.0 tests cover:
- Parameter counting
- Trainable parameter counting
- Model size estimation
- Tensor size estimation
- Report serialization with
to_dict() - Basic end-to-end profiling on a tiny model
Notes on Profiling
TinyTorchProfiler uses PyTorch forward passes and hooks.
Layer-wise latency is approximate. It is useful for identifying likely bottlenecks, but exact timing can vary depending on hardware, backend libraries, CPU load, CUDA synchronization, and model structure.
Activation memory is estimated from forward outputs. It should be treated as a helpful approximation, not as a replacement for full memory tracing.
Roadmap
Planned future improvements:
- Batch-size scaling analysis
- Memory peak tracking
- Visualization
- ONNX export profiling
- ViT and DINOv2 examples
Version
Current version: 0.1.2
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tinytorchprofiler-0.1.2.tar.gz.
File metadata
- Download URL: tinytorchprofiler-0.1.2.tar.gz
- Upload date:
- Size: 11.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dc63486d1cd550e139f215786814dd7dbe6aa2a2edb3f5263b3318d9db24b1f9
|
|
| MD5 |
c4180feb1bd4f9db00ef7a3f17b8fd8d
|
|
| BLAKE2b-256 |
2bb560f16a50521c0935a14ff4b381bd59fbb3d06b20fee87a0e5fda06bbf1e6
|
File details
Details for the file tinytorchprofiler-0.1.2-py3-none-any.whl.
File metadata
- Download URL: tinytorchprofiler-0.1.2-py3-none-any.whl
- Upload date:
- Size: 10.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f41a8fe228d7415a4aec4457195e074373db61e83aa0fc36623249d0dabade58
|
|
| MD5 |
5cbe31b40258846680b2cba11becbabd
|
|
| BLAKE2b-256 |
27c7a3b850df9db5d05a34ab4e4195f162c0d6a5ef94e034ae32e4b33a3efcf8
|