Lightning support for Intel Habana accelerators
Project description
Lightning ⚡ Intel Habana
Intel® Gaudi® AI Processor (HPU) training processors are built on a heterogeneous architecture with a cluster of fully programmable Tensor Processing Cores (TPC) along with its associated development tools and libraries, and a configurable Matrix Math engine.
The TPC core is a VLIW SIMD processor with an instruction set and hardware tailored to serve training workloads efficiently. The Gaudi memory architecture includes on-die SRAM and local memories in each TPC and, Gaudi is the first DL training processor that has integrated RDMA over Converged Ethernet (RoCE v2) engines on-chip.
On the software side, the PyTorch Habana bridge interfaces between the framework and SynapseAI software stack to enable the execution of deep learning models on the Habana Gaudi device.
Gaudi provides a significant cost-effective benefit, allowing you to engage in more deep learning training while minimizing expenses.
For more information, check out Gaudi Architecture and Gaudi Developer Docs.
Installing Lighting Habana
To install Lightning Habana, run the following command:
pip install -U lightning lightning-habana
NOTE
Ensure either of lightning or pytorch-lightning is used when working with the plugin. Mixing strategies, plugins etc from both packages is not yet validated.
Using PyTorch Lighting with HPU
To enable PyTorch Lightning with HPU accelerator, provide accelerator=HPUAccelerator()
parameter to the Trainer class.
from lightning import Trainer
from lightning_habana.pytorch.accelerator import HPUAccelerator
# Run on one HPU.
trainer = Trainer(accelerator=HPUAccelerator(), devices=1)
# Run on multiple HPUs.
trainer = Trainer(accelerator=HPUAccelerator(), devices=8)
# Choose the number of devices automatically.
trainer = Trainer(accelerator=HPUAccelerator(), devices="auto")
The devices=1
parameter with HPUs enables the Habana accelerator for single card training using SingleHPUStrategy
.
The devices>1
parameter with HPUs enables the Habana accelerator for distributed training. It uses HPUDDPStrategy
which is based on DDP strategy with the integration of Habana’s collective communication library (HCCL) to support scale-up within a node and scale-out across multiple nodes.
Support Matrix
SynapseAI | 1.16.0 |
---|---|
PyTorch | 2.2.2 |
(PyTorch) Lightning* | 2.3.x |
Lightning Habana | 1.6.0 |
DeepSpeed** | Forked from v0.14.0 of the official DeepSpeed repo. |
* covers both packages lightning
and pytorch-lightning
For more information, check out HPU Support Matrix
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file lightning_habana-1.6.0.tar.gz
.
File metadata
- Download URL: lightning_habana-1.6.0.tar.gz
- Upload date:
- Size: 55.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 00f07d1105d6d1ccf596686684ab06c58e7811e434d4ac6a7e723a449ba0549b |
|
MD5 | e0ce0a73a8e029a66d530a499b3eda56 |
|
BLAKE2b-256 | c9d36291e1457bd33707912cadee809697d26f08696fe419e5c254a70bf1ac67 |
File details
Details for the file lightning_habana-1.6.0-py3-none-any.whl
.
File metadata
- Download URL: lightning_habana-1.6.0-py3-none-any.whl
- Upload date:
- Size: 81.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d9911562c0c3f82ee4716350ca7231cce9791683b647e1ff7bc69c2626f080fb |
|
MD5 | a09e632bd77a41c7be90c3866fdc5f4c |
|
BLAKE2b-256 | 8a02fdca9521f57fcc8c93aa8a133546abc192ef3b227fbc5c86dff5fa632d29 |