Skip to main content

A library for persisting PyTorch program state

Project description

torchsnapshot

build status pypi version pypi nightly version codecov bsd license

This library is currently in Alpha and currently does not have a stable release. The API may change and may not be backward compatible. If you have suggestions for improvements, please open a GitHub issue. We'd love to hear your feedback.

A light-weight library for adding fault tolerance to large-scale PyTorch distributed training workloads.

Install

Requires Python >= 3.7 and PyTorch >= 1.11

From pip:

pip install --pre torchsnapshot-nightly

From source:

git clone https://github.com/facebookresearch/torchsnapshot
cd torchsnapshot
pip install -r requirements.txt
python setup.py install

Concepts

  • Stateful object - an object that whose state can be obtained via .state_dict() and restored via .load_state_dict(). Most PyTorch components (e.g. Module, Optimizer, LRScheduler) already implement this protocol.
  • App state - the application state described using multiple stateful objects.
  • Snapshot - the persisted app state.

Basic Usage

Describing the application state with multiple stateful objects:

app_state = {"model": model, "optimizer": optimizer}

Taking a snapshot of the application state:

from torchsnapshot import Snapshot

# File System
snapshot = Snapshot.take(path="/foo/bar/baz", app_state=app_state)

# S3
snapshot = Snapshot.take(path="s3://foo/bar", app_state=app_state)

# Google Cloud Storage
snapshot = Snapshot.take(path="gcs://foo/bar", app_state=app_state)

Referencing an existing snapshot:

snapshot = Snapshot(path="foo/bar/baz")

Restoring the application state from a snapshot:

snapshot.restore(app_state=app_state)

See the example directory for more examples.

License

torchsnapshot is BSD licensed, as found in the LICENSE file.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

torchsnapshot-nightly-2022.6.22.tar.gz (24.4 kB view details)

Uploaded Source

Built Distribution

torchsnapshot_nightly-2022.6.22-py3-none-any.whl (32.9 kB view details)

Uploaded Python 3

File details

Details for the file torchsnapshot-nightly-2022.6.22.tar.gz.

File metadata

File hashes

Hashes for torchsnapshot-nightly-2022.6.22.tar.gz
Algorithm Hash digest
SHA256 d5847128cb6e54ecb09c43b3f544f860c20cb88939c12cf5e171846c3b6f88f9
MD5 294380d7c7b9400e80c850e08bd70b0a
BLAKE2b-256 2396eebbae7c83cf8eabf9e7f6b67c4d3ed12d6e3159924840461124c36c9703

See more details on using hashes here.

File details

Details for the file torchsnapshot_nightly-2022.6.22-py3-none-any.whl.

File metadata

File hashes

Hashes for torchsnapshot_nightly-2022.6.22-py3-none-any.whl
Algorithm Hash digest
SHA256 30b9f9314a7256112e994c6281fcaf71045e04df3d7505844e12f32df18cba5e
MD5 13426263172d10ea4bc80044c322aeae
BLAKE2b-256 97ded609c8d2ed5be2a9600310f150e20ffa04bf2a8257cfbac4fcb928cddca8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page