Skip to main content

A library for persisting PyTorch program state

Project description

torchsnapshot

This library is currently in Alpha and currently does not have a stable release. The API may change and may not be backward compatible. If you have suggestions for improvements, please open a GitHub issue. We'd love to hear your feedback.

A light-weight library for adding fault tolerance to large-scale PyTorch distributed training workloads.

Install

Requires Python >= 3.7 and PyTorch >= 1.11

From pip:

pip install torchsnapshot

From source:

git clone https://github.com/facebookresearch/torchsnapshot
cd torchsnapshot
pip install -r requirements.txt
python setup.py install

Concepts

  • Stateful object - an object that whose state can be obtained via .state_dict() and restored via .load_state_dict(). Most PyTorch components (e.g. Module, Optimizer, LRScheduler) already implement this protocol.
  • App state - the application state described using multiple stateful objects.
  • Snapshot - the persisted app state.

Basic Usage

Describing the application state with multiple stateful objects:

app_state = {"model": model, "optimizer": optimizer}

Taking a snapshot of the application state:

from torchsnapshot import Snapshot

# File System
snapshot = Snapshot.take(path="/foo/bar/baz", app_state=app_state)

# S3
snapshot = Snapshot.take(path="s3://foo/bar", app_state=app_state)

# Google Cloud Storage
snapshot = Snapshot.take(path="gcs://foo/bar", app_state=app_state)

Referencing an existing snapshot:

snapshot = Snapshot(path="foo/bar/baz")

Restoring the application state from a snapshot:

snapshot.restore(app_state=app_state)

See the example directory for more examples.

License

torchsnapshot is BSD licensed, as found in the LICENSE file.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

torchsnapshot-nightly-2022.6.16.tar.gz (20.3 kB view details)

Uploaded Source

Built Distributions

torchsnapshot_nightly-2022.6.16a-py3.9.egg (68.5 kB view details)

Uploaded Source

torchsnapshot_nightly-2022.6.16-py3.7.egg (57.2 kB view details)

Uploaded Source

torchsnapshot_nightly-2022.6.16-py3-none-any.whl (28.0 kB view details)

Uploaded Python 3

File details

Details for the file torchsnapshot-nightly-2022.6.16.tar.gz.

File metadata

File hashes

Hashes for torchsnapshot-nightly-2022.6.16.tar.gz
Algorithm Hash digest
SHA256 de267040adbc276fdb15bdda8ecdd3a8d4f854bfff6f1bd4bb36d8e01aae5df6
MD5 a9d7c9e161f97b6609109c96c5b0ed5e
BLAKE2b-256 a1b303c0b29f99d1cabd3610bfa68b8ef39c5e0c531f3b3602b37d5be2284815

See more details on using hashes here.

File details

Details for the file torchsnapshot_nightly-2022.6.16a-py3.9.egg.

File metadata

File hashes

Hashes for torchsnapshot_nightly-2022.6.16a-py3.9.egg
Algorithm Hash digest
SHA256 72d72e04121ee6f1b0b13eea4cafaef8bfaff9abb2e43f81f36373789ba641f6
MD5 f331c90e27b34fe743c747bdfff3c2f5
BLAKE2b-256 e95bafa12d37eff59537f95da2f2c03032ef19f2371bfdcc29d9305a66cf3213

See more details on using hashes here.

File details

Details for the file torchsnapshot_nightly-2022.6.16a1-py3-none-any.whl.

File metadata

File hashes

Hashes for torchsnapshot_nightly-2022.6.16a1-py3-none-any.whl
Algorithm Hash digest
SHA256 86cae647fd77bc9b20f154ac58666292a070c2828ddd4f108162d492277ea99f
MD5 2851350b7204c5c74e6ea45d159d42f2
BLAKE2b-256 d8c9c388c76fbfaa5e53a338b6dbd86b4c71dbf4bb4c5ae932a48f972748045a

See more details on using hashes here.

File details

Details for the file torchsnapshot_nightly-2022.6.16-py3.7.egg.

File metadata

File hashes

Hashes for torchsnapshot_nightly-2022.6.16-py3.7.egg
Algorithm Hash digest
SHA256 f3b6bc55d5a08ec9cd46e640ca0a0e014173fd3fa505c5e0316b19bbd91279dc
MD5 58554529a60a51880fe8f3136fba2df5
BLAKE2b-256 630b72ea579d205b1758a3adea1594882c4c49dc0fa43370eb0738b6bf79f613

See more details on using hashes here.

File details

Details for the file torchsnapshot_nightly-2022.6.16-py3-none-any.whl.

File metadata

File hashes

Hashes for torchsnapshot_nightly-2022.6.16-py3-none-any.whl
Algorithm Hash digest
SHA256 b11d60a0b499d3cedf27f99097d6c2745a861954ed459766ac72755257542756
MD5 16755486f3b52deeff19bb013750213e
BLAKE2b-256 1315526888e4b0352675aac6cb12644ad530f979e1c285cbecefbef65da60d54

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page