Habana's lightning-specific optimized plugins
Project description
Habana Lightning Plugins
Habana Lightning plugins is a suite of plugins that aid/accelerate model training using Lightning framework for HPU. The plugins acts as an extension to the lightning framework to support HPU specific features.
Currently, the following plugins are available:
- HPUDataModule
- HPUProfiler
Installation
To install Habana lightning plugins run the following command:
python -um pip install habana-lightning-plugins
HPUDataModule
HPUDataModule
is an extension to the LightningDataModule
class which uses Habana's dataloader to load and pre-process the input data.
Using HPUDataModule offloads the data preprocessing overhead to the HPU and in turn increases the performance of training. The wrapper also
aids in switching between hardware and software preprocessor based on the specific Gaudi device used.
Visit Habana Dataloader for more information related to Habana Dataloader.
Usage
The following shows an example of how to use the HPUDataModule
:
- Import Habana Datamodule:
from habana_lightning_plugins.datamodule import HPUDataModule
- Create and initialize HPUDataModule object with the dataset and the configuration required to preprocess the data:
train_dir = "./path/to/train/data"
val_dir = "./path/to/val/data"
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
train_transforms = [
transforms.RandomResizedCrop(224),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
normalize,
]
val_transforms = [
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
normalize,
]
data_module = HPUDataModule(
train_dir,
val_dir,
train_transforms=train_transforms,
val_transforms=val_transforms,
num_workers=8,
batch_size=32,
shuffle=False,
pin_memory=True,
drop_last=True,
)
- Create an object of Lightning trainer and model:
trainer = pl.Trainer(devices=1, accelerator="hpu", max_epochs=1, max_steps=2)
model = RN50Module() # Or any other model to be defined by user
- Pass the datamodule object as an argument to trainer to execute train/val/test loops:
trainer.fit(model, datamodule=data_module)
trainer.validate(model, datamodule=data_module)
Examples
- A sample script can be found at
examples/hpu_datamodule_sample.py
.
python examples/hpu_datamodule_sample.py --data-path <path to Imagenet dataset - ILSVRC2012>
A reference model using HPUDataModule can be found in the ResNet50 Model Reference
Limitations
- HPUDataModule supports the
Imagenet
dataset only. - HPUDataModule supports only 8 parallel data loader workers.
HPUProfiler
HPUProfiler is a lightning implementation of PyTorch profiler for HPU devices. It aids in obtaining profiling summary of PyTorch functions. It subclasses PyTorch Lightning's PyTorch profiler.
Default Profiling
For auto profiling, create a HPUProfiler instance and pass it to trainer.
At the end of profiler.fit()
, it will generate a json trace for the run.
In case accelerator = "hpu"
is not used with HPUProfiler, then it will dump only CPU traces, similar to PyTorchProfiler.
# Import profiler
from habana_lightning_plugins.profiler import HPUProfiler
# Create profiler object
profiler = HPUProfiler()
accelerator = "hpu"
# Pass profiler to the trainer
trainer = Trainer(
profiler=profiler,
accelerator=accelerator,
)
Distributed Profiling
To profile a distributed model, use the HPUProfiler with the filename argument which will save a report per rank:
from habana_lightning_plugins.profiler import HPUProfiler
profiler = HPUProfiler(filename="perf-logs")
trainer = Trainer(profiler=profiler, accelerator="hpu")
Custom Profiling
To profile custom actions of interest, reference a profiler in the LightningModule:
from habana_lightning_plugins.profiler import HPUProfiler
# Reference profiler in LightningModule
class MyModel(LightningModule):
def __init__(self, profiler=None):
self.profiler = profiler
# To profile in any part of your code, use the self.profiler.profile() function
def custom_processing_step_basic(self, data):
with self.profiler.profile("my_custom_action"):
...
return data
# Alternatively, use self.profiler.start("my_custom_action")
# and self.profiler.stop("my_custom_action") functions
# to enclose the part of code to be profiled.
def custom_processing_step_granular(self, data):
self.profiler.start("my_custom_action")
...
self.profiler.stop("my_custom_action")
return data
# Pass profiler instance to LightningModule
profiler = HPUProfiler()
model = MyModel(profiler)
trainer = Trainer(profiler=profiler, accelerator="hpu")
For more details on profiler, refer to PyTorchProfiler
Visualize Profiled Operations
Profiler will dump traces in json format. The traces can be visualized in 2 ways:
Using PyTorch TensorBoard Profiler
For further instructions see, https://github.com/pytorch/kineto/tree/master/tb_plugin.
# Install tensorboard
python -um pip install tensorboard torch-tb-profiler
# Start the TensorBoard server (default at port 6006):
tensorboard --logdir ./tensorboard --port 6006
# Now open the following url on your browser
http://localhost:6006/#profile
Using Chrome
1. Open Chrome and copy/paste this URL: `chrome://tracing/`.
2. Once tracing opens, click on `Load` at the top-right and load one of the generated traces.
Limitations
-
When using the HPUProfiler, wall clock time will not be representative of the true wall clock time. This is due to forcing profiled operations to be measured synchronously, when many HPU ops happen asynchronously. It is recommended to use this Profiler to find bottlenecks/breakdowns, however for end to end wall clock time use the SimpleProfiler.
-
HPUProfiler.summary() is not supported
-
Passing profiler name as string "hpu" to the trainer is not supported.
Supported Configurations
Validated on | SynapseAI Version | PyTorch Version | PyTorch Lightning Version |
---|---|---|---|
Gaudi | 1.9.0 | 1.13.1 | 1.9.4 |
Gaudi2 | 1.9.0 | 1.13.1 | 1.9.4 |
Changelog
- habana-lightning-plugins introduced with support for datamodule and profiler plugins
Contributing
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Please make sure to update tests as appropriate.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file habana_lightning_plugins-1.10.0.494-py3-none-any.whl
.
File metadata
- Download URL: habana_lightning_plugins-1.10.0.494-py3-none-any.whl
- Upload date:
- Size: 18.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9ab8f140cafcff749e262890139ea0e7db91b03835aefbf8072a569de58f1043 |
|
MD5 | 67649aa3279e9caae73c0877300511ed |
|
BLAKE2b-256 | cc94315355a4dd139a6b3f7584933ea1612c0599171f865e41834241f35f1740 |