Skip to main content

Lightning support for GraphCore accelerators

Project description

Lightning ⚡ GraphCore

Audience: Users looking to save money and run large models faster using single or multiple IPU devices.

lightning Build Status General checks Deploy Docs pre-commit.ci status

What is an IPU?

The Graphcore Intelligence Processing Unit (IPU), built for Artificial Intelligence and Machine Learning, consists of many individual cores, called tiles, allowing highly parallel computation. Due to the high bandwidth between tiles, IPUs facilitate machine learning loads where parallelization is essential. Because computation is heavily parallelized,

IPUs operate in a different way to conventional accelerators such as CPU/GPUs. IPUs do not require large batch sizes for maximum parallelization, can provide optimizations across the compiled graph and rely on model parallelism to fully utilize tiles for larger models.

IPUs are used to build IPU-PODs, rack-based systems of IPU-Machines for larger workloads. See the IPU Architecture for more information.

See the Graphcore Glossary for the definitions of other IPU-specific terminology.


Installation

pip install -U lightning-graphcore[lightning]

Run on IPU

To enable PyTorch Lightning to utilize the IPU accelerator, simply provide accelerator="ipu" parameter to the Trainer class.

To use multiple IPUs set the devices to a number that is a power of 2 (i.e: 2, 4, 8, 16, ...)

from lightning import Trainer
# run on as many IPUs as available by default
trainer = Trainer(accelerator="auto", devices="auto", strategy="auto")
# equivalent to
trainer = Trainer()

# run on one IPU
trainer = Trainer(accelerator="ipu", devices=1)
# run on multiple IPUs
trainer = Trainer(accelerator="ipu", devices=8)
# choose the number of devices automatically
trainer = Trainer(accelerator="ipu", devices="auto")

How to access IPUs

To use IPUs you must have access to a system with IPU devices. To get access see get started.

You must ensure that the IPU system has enabled the PopART and Poplar packages from the SDK. Instructions are in the Get Started guide for your IPU system, on the Graphcore documents portal.

Known limitations

Currently there are some known limitations that are being addressed in the near future to make the experience seamless when moving from different devices.

Please see the MNIST example which displays most of the limitations and how to overcome them till they are resolved.

  • self.log is not supported in the training_step, validation_step, test_step or predict_step. This is due to the step function being traced and sent to the IPU devices.
  • Since the step functions are traced, branching logic or any form of primitive values are traced into constants. Be mindful as this could lead to errors in your custom code.
  • Clipping gradients is not supported.
  • It is not possible to use BatchSampler in your dataloaders if you are using multiple IPUs.
  • IPUs handle the data transfer to the device on the host, hence the hooks ModelHooks.transfer_batch_to_device and ModelHooks.on_after_batch_transfer do not apply here and if you have overridden any of them, an exception will be raised.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lightning-graphcore-0.1.0.tar.gz (18.8 kB view details)

Uploaded Source

Built Distribution

lightning_graphcore-0.1.0-py3-none-any.whl (18.1 kB view details)

Uploaded Python 3

File details

Details for the file lightning-graphcore-0.1.0.tar.gz.

File metadata

  • Download URL: lightning-graphcore-0.1.0.tar.gz
  • Upload date:
  • Size: 18.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.6

File hashes

Hashes for lightning-graphcore-0.1.0.tar.gz
Algorithm Hash digest
SHA256 548951beccd9ba7ce1822d34d19e3331dbbe14c1773a77f29c12689e2f8455fb
MD5 b195aa3378b01f80565fd1d57b4e2676
BLAKE2b-256 deec2d76d78d53c0a02fc4f1784dcc9d628b6d90b9371d1ffc63ecb533992c48

See more details on using hashes here.

File details

Details for the file lightning_graphcore-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for lightning_graphcore-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a70a6434ae60931b887b64f85a768b69879f4df563f1e4d87b1a9f4244d44f86
MD5 a5e633498d3422c7c5c194b973ef642c
BLAKE2b-256 ab8eee4ad1f1c9e809625a07c98670df1f5b0f645c26ed9cd460b75d5eb5ec7b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page