Nodify Tensorboard Plugin.
Project description
Nodify Tensorboard Plugin
This is a tensorboard plugin by Trainy to supplement the existing PyTorch Profiler. This provides additional visualizations to effectively characterize traces for runs involving multiple GPUs. The plugin expects all traces to be collected using torch.profile
and to be located in the same folder.
Installation & Quickstart
Install tensorboard and the plugin.
pip install tensorboard
pip install nodify-plugin
Generate PyTorch profiler traces as shown here and bring up the tensorboard where your traces are living. A set of example logs are provided in this repo under log/resnet18
tensorboard --logdir log/resnet18/
Take a look at our quickstart guide to learn about the different graph views and how you can use them to debug your multinode training.
Development
To view the plugin for development, create a virtual environment, install the requirements, and install the plugin.
python -m venv venv
. venv/bin/activate
pip install -e .
Feature roadmap
A lot of the features on the roadmap use Meta's Dynolog, kineto, and holistic trace analyzer.
- On-demand tracing and metrics through dynologger
- Recommendations for fixing multinode bottlenecks
- reading from logs stored on cloud object-stores (e.g. Amazon S3, Azure Blob)
Contributing
For feature requests or bug reports/fixes, feel free to open a Github issue or make a pull request. We'd love to connect with any interested developers and we check our Discord to discuss the direction of our projects everyday. Connect with us either throught the text channels or through DM's.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file nodify_plugin-0.1.2-py3-none-any.whl
.
File metadata
- Download URL: nodify_plugin-0.1.2-py3-none-any.whl
- Upload date:
- Size: 122.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 59a0d6e98c52ffe9b91371bf47a27f4d5606cf706f4d626e30f92832f808b13f |
|
MD5 | e797fd6a295993d1d6f4341706bcb757 |
|
BLAKE2b-256 | 979f11af0332a826d587797df93a181824c285f6eb5f205059bed2d99328d993 |