No project description provided

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Fabric is the fast and lightweight way to scale PyTorch models without boilerplate

Website • Docs • Getting started • FAQ • Help • Discord

Lightning Fabric: Expert control.

Run on any device at any scale with expert-level control over PyTorch training loop and scaling strategy. You can even write your own Trainer.

Fabric is designed for the most complex models like foundation model scaling, LLMs, diffusion, transformers, reinforcement learning, active learning. Of any size.

What to change	Resulting Fabric Code (copy me!)
_{+ import lightning as L import torch; import torchvision as tv dataset = tv.datasets.CIFAR10("data", download=True, train=True, transform=tv.transforms.ToTensor()) + fabric = L.Fabric() + fabric.launch() model = tv.models.resnet18() optimizer = torch.optim.SGD(model.parameters(), lr=0.001) - device = "cuda" if torch.cuda.is_available() else "cpu" - model.to(device) + model, optimizer = fabric.setup(model, optimizer) dataloader = torch.utils.data.DataLoader(dataset, batch_size=8) + dataloader = fabric.setup_dataloaders(dataloader) model.train() num_epochs = 10 for epoch in range(num_epochs): for batch in dataloader: inputs, labels = batch - inputs, labels = inputs.to(device), labels.to(device) optimizer.zero_grad() outputs = model(inputs) loss = torch.nn.functional.cross_entropy(outputs, labels) - loss.backward() + fabric.backward(loss) optimizer.step()}	_{import lightning as L import torch; import torchvision as tv dataset = tv.datasets.CIFAR10("data", download=True, train=True, transform=tv.transforms.ToTensor()) fabric = L.Fabric() fabric.launch() model = tv.models.resnet18() optimizer = torch.optim.SGD(model.parameters(), lr=0.001) model, optimizer = fabric.setup(model, optimizer) dataloader = torch.utils.data.DataLoader(dataset, batch_size=8) dataloader = fabric.setup_dataloaders(dataloader) model.train() num_epochs = 10 for epoch in range(num_epochs): for batch in dataloader: inputs, labels = batch optimizer.zero_grad() outputs = model(inputs) loss = torch.nn.functional.cross_entropy(outputs, labels) fabric.backward(loss) optimizer.step()}

What to change

Resulting Fabric Code (copy me!)

_{+ import lightning as L
import torch; import torchvision as tv

dataset = tv.datasets.CIFAR10("data", download=True,
train=True,
transform=tv.transforms.ToTensor())

+ fabric = L.Fabric()
+ fabric.launch()

model = tv.models.resnet18()
optimizer = torch.optim.SGD(model.parameters(), lr=0.001)
- device = "cuda" if torch.cuda.is_available() else "cpu"
- model.to(device)
+ model, optimizer = fabric.setup(model, optimizer)

dataloader = torch.utils.data.DataLoader(dataset, batch_size=8)
+ dataloader = fabric.setup_dataloaders(dataloader)

model.train()
num_epochs = 10
for epoch in range(num_epochs):
for batch in dataloader:
inputs, labels = batch
- inputs, labels = inputs.to(device), labels.to(device)
optimizer.zero_grad()
outputs = model(inputs)
loss = torch.nn.functional.cross_entropy(outputs, labels)
- loss.backward()
+ fabric.backward(loss)
optimizer.step()}

_{import lightning as L
import torch; import torchvision as tv

dataset = tv.datasets.CIFAR10("data", download=True,
train=True,
transform=tv.transforms.ToTensor())

fabric = L.Fabric()
fabric.launch()

model = tv.models.resnet18()
optimizer = torch.optim.SGD(model.parameters(), lr=0.001)
model, optimizer = fabric.setup(model, optimizer)

dataloader = torch.utils.data.DataLoader(dataset, batch_size=8)
dataloader = fabric.setup_dataloaders(dataloader)

model.train()
num_epochs = 10
for epoch in range(num_epochs):
for batch in dataloader:
inputs, labels = batch
optimizer.zero_grad()
outputs = model(inputs)
loss = torch.nn.functional.cross_entropy(outputs, labels)
fabric.backward(loss)
optimizer.step()}

Key features

Easily switch from running on CPU to GPU (Apple Silicon, CUDA, …), TPU, multi-GPU or even multi-node training

# Use your available hardware
# no code changes needed
fabric = Fabric()

# Run on GPUs (CUDA or MPS)
fabric = Fabric(accelerator="gpu")

# 8 GPUs
fabric = Fabric(accelerator="gpu", devices=8)

# 256 GPUs, multi-node
fabric = Fabric(accelerator="gpu", devices=8, num_nodes=32)

# Run on TPUs
fabric = Fabric(accelerator="tpu")

Use state-of-the-art distributed training strategies (DDP, FSDP, DeepSpeed) and mixed precision out of the box

# Use state-of-the-art distributed training techniques
fabric = Fabric(strategy="ddp")
fabric = Fabric(strategy="deepspeed")
fabric = Fabric(strategy="fsdp")

# Switch the precision
fabric = Fabric(precision="16-mixed")
fabric = Fabric(precision="64")

All the device logic boilerplate is handled for you

  # no more of this!
- model.to(device)
- batch.to(device)

Build your own custom Trainer using Fabric primitives for training checkpointing, logging, and more

import lightning as L


class MyCustomTrainer:
    def __init__(self, accelerator="auto", strategy="auto", devices="auto", precision="32-true"):
        self.fabric = L.Fabric(accelerator=accelerator, strategy=strategy, devices=devices, precision=precision)

    def fit(self, model, optimizer, dataloader, max_epochs):
        self.fabric.launch()

        model, optimizer = self.fabric.setup(model, optimizer)
        dataloader = self.fabric.setup_dataloaders(dataloader)
        model.train()

        for epoch in range(max_epochs):
            for batch in dataloader:
                input, target = batch
                optimizer.zero_grad()
                output = model(input)
                loss = loss_fn(output, target)
                self.fabric.backward(loss)
                optimizer.step()

You can find a more extensive example in our examples

Read the Lightning Fabric docs

Continuous Integration

Lightning is rigorously tested across multiple CPUs and GPUs and against major Python and PyTorch versions.

*Codecov is > 90%+ but build delays may show less

Current build statuses

System / PyTorch ver.	1.12	1.13	2.0	2.1
Linux py3.9 [GPUs]
Linux (multiple Python versions)
OSX (multiple Python versions)
Windows (multiple Python versions)

Getting started

Install Lightning

Prerequisites

TIP: We strongly recommend creating a virtual environment first. Don’t know what this is? Follow our beginner guide here.

Python 3.8.x or later (3.8.x, 3.9.x, 3.10.x, ...)

pip install -U lightning

Convert your PyTorch to Fabric

Create the Fabric object at the beginning of your training code.
```
import Lightning as L

fabric = L.Fabric()
```

Call setup() on each model and optimizer pair and setup_dataloaders() on all your data loaders.

model, optimizer = fabric.setup(model, optimizer)
dataloader = fabric.setup_dataloaders(dataloader)

Remove all .to and .cuda calls -> Fabric will take care of it.
```
- model.to(device)
- batch.to(device)
```
Replace loss.backward() by fabric.backward(loss).
```
- loss.backward()
+ fabric.backward(loss)
```
Run the script from the terminal with
```
lightning run model path/to/train.py
```

or use the launch() method in a notebook. Learn more about launching distributed training.

FAQ

When to use Fabric?

Minimum code changes- You want to scale your PyTorch model to use multi-GPU or use advanced strategies like DeepSpeed without having to refactor. You don’t care about structuring your code- you just want to scale it as fast as possible.
Maximum control- Write your own training and/or inference logic down to the individual optimizer calls. You aren’t forced to conform to a standardized epoch-based training loop like the one in Lightning Trainer. You can do flexible iteration based training, meta-learning, cross-validation and other types of optimization algorithms without digging into framework internals. This also makes it super easy to adopt Fabric in existing PyTorch projects to speed-up and scale your models without the compromise on large refactors. Just remember: With great power comes a great responsibility.
Maximum flexibility- You want to have full control over your entire training- in Fabric all features are opt-in, and it provides you with a tool box of primitives so you can build your own Trainer.

When to use the Lightning Trainer?

You want to have all the engineering boilerplate handled for you - dozens of features like checkpointing, logging and early stopping out of the box. Less hassle, less error prone, easy to try different techniques and features.
You want to have good defaults chosen for you - so you can have a better starting point.
You want your code to be modular, readable and well structured - easy to share between projects and with collaborators.

Can I use Fabric with my LightningModule or Lightning Callback?

Yes :) Fabric works with PyTorch LightningModules and Callbacks, so you can choose how to structure your code and reuse existing models and callbacks as you wish. Read more here.

Examples

Asking for help

If you have any questions please:

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

2.2.4

May 1, 2024

2.2.3

Apr 23, 2024

2.2.2

Apr 12, 2024

2.2.1

Mar 4, 2024

2.2.0.post0

Feb 12, 2024

2.2.0

Feb 8, 2024

2.2.0rc0 pre-release

Feb 1, 2024

2.1.4

Feb 1, 2024

2.1.3

Dec 21, 2023

2.1.2

Nov 15, 2023

2.1.1

Nov 8, 2023

2.1.0

Oct 12, 2023

2.1.0rc1 pre-release

Oct 10, 2023

2.1.0rc0 pre-release

Aug 16, 2023

2.0.9.post0 yanked

Sep 28, 2023

Reason this release was yanked:

No change since 2.0.9

2.0.9

Sep 14, 2023

2.0.8

Aug 30, 2023

2.0.7

Aug 16, 2023

2.0.6

Jul 25, 2023

2.0.5

Jul 10, 2023

2.0.4

Jun 22, 2023

2.0.3

Jun 7, 2023

2.0.2

Apr 24, 2023

2.0.1.post0

Apr 11, 2023

2.0.1

Mar 30, 2023

2.0.0

Mar 15, 2023

2.0.0rc0 pre-release

Feb 23, 2023

1.9.5

Apr 12, 2023

1.9.4

Mar 2, 2023

1.9.3

Feb 21, 2023

1.9.2

Feb 15, 2023

1.9.1

Feb 10, 2023

1.9.0

Jan 18, 2023

1.9.0rc0 pre-release

Jan 6, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lightning-fabric-2.2.4.tar.gz (189.5 kB view hashes)

Uploaded May 1, 2024 Source

Built Distribution

lightning_fabric-2.2.4-py3-none-any.whl (243.4 kB view hashes)

Uploaded May 1, 2024 Python 3

Hashes for lightning-fabric-2.2.4.tar.gz

Hashes for lightning-fabric-2.2.4.tar.gz
Algorithm	Hash digest
SHA256	`18883f628063f325b6e09adf4d5804ddd3216a88967ccd3277a65595f1cd46c9`
MD5	`190c4a470d4186ee479d521fafab12df`
BLAKE2b-256	`d076c1dc2da5f2e3d90ffa623ac42470326f35a6c73ca4ce56d1c73d71a473db`

Hashes for lightning_fabric-2.2.4-py3-none-any.whl

Hashes for lightning_fabric-2.2.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`aaa7fc039f1bf730a04361580bf3fa51867ce362e95ad30d1af6335155a23e1f`
MD5	`427c5a26b94d6f275dda082b4080f7dd`
BLAKE2b-256	`d8bacf6f4cf0bc43907ff304218b7f0618e0c7936ff298c5e94e7efac77b6e7e`