Skip to main content

A library for persisting PyTorch program state

Project description

torchsnapshot

This library is currently in Alpha and currently does not have a stable release. The API may change and may not be backward compatible. If you have suggestions for improvements, please open a GitHub issue. We'd love to hear your feedback.

A light-weight library for adding fault tolerance to large-scale PyTorch distributed training workloads.

Install

Requires Python >= 3.7 and PyTorch >= 1.11

From pip:

pip install torchsnapshot

From source:

git clone https://github.com/facebookresearch/torchsnapshot
cd torchsnapshot
pip install -r requirements.txt
python setup.py install

Concepts

  • Stateful object - an object that whose state can be obtained via .state_dict() and restored via .load_state_dict(). Most PyTorch components (e.g. Module, Optimizer, LRScheduler) already implement this protocol.
  • App state - the application state described using multiple stateful objects.
  • Snapshot - the persisted app state.

Basic Usage

Describing the application state with multiple stateful objects:

app_state = {"model": model, "optimizer": optimizer}

Taking a snapshot of the application state:

from torchsnapshot import Snapshot

# File System
snapshot = Snapshot.take(path="/foo/bar/baz", app_state=app_state)

# S3
snapshot = Snapshot.take(path="s3://foo/bar", app_state=app_state)

# Google Cloud Storage
snapshot = Snapshot.take(path="gcs://foo/bar", app_state=app_state)

Referencing an existing snapshot:

snapshot = Snapshot(path="foo/bar/baz")

Restoring the application state from a snapshot:

snapshot.restore(app_state=app_state)

See the example directory for more examples.

License

torchsnapshot is BSD licensed, as found in the LICENSE file.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

torchsnapshot-nightly-2022.6.15.tar.gz (20.3 kB view details)

Uploaded Source

Built Distributions

torchsnapshot_nightly-2022.6.15a-py3-none-any.whl (28.0 kB view details)

Uploaded Python 3

torchsnapshot_nightly-2022.6.15-py3.7.egg (57.2 kB view details)

Uploaded Source

torchsnapshot_nightly-2022.6.15-py3-none-any.whl (28.0 kB view details)

Uploaded Python 3

File details

Details for the file torchsnapshot-nightly-2022.6.15.tar.gz.

File metadata

File hashes

Hashes for torchsnapshot-nightly-2022.6.15.tar.gz
Algorithm Hash digest
SHA256 f28e53c1102652d4b9cd7cba6291cab565c678659f80908465f13d62eb32036b
MD5 066d4bc43176c8529f02bd7f5d95a3a2
BLAKE2b-256 28e25b7aef9f7f7bd705f00d918ffa939f0ba837bd09f6e5a6d171d54862ab99

See more details on using hashes here.

File details

Details for the file torchsnapshot_nightly-2022.6.15a-py3-none-any.whl.

File metadata

File hashes

Hashes for torchsnapshot_nightly-2022.6.15a-py3-none-any.whl
Algorithm Hash digest
SHA256 3c4f827dd22ff7123274c41899e0b0dac09e3ae60321191a3f12b64023ec364d
MD5 5af2d1b360bb02b2b094582275de7646
BLAKE2b-256 1be40424de7fa918acb9e8da701b1ea357d4c7980f0ef7bb4b78a1f7d46a0cef

See more details on using hashes here.

File details

Details for the file torchsnapshot_nightly-2022.6.15-py3.7.egg.

File metadata

File hashes

Hashes for torchsnapshot_nightly-2022.6.15-py3.7.egg
Algorithm Hash digest
SHA256 e8cdb0eb4d47e325a9490bb83665aec8a0e396b9317e75372bc7925bc905c6e9
MD5 e912a2be44f12345fa3f541412c82db5
BLAKE2b-256 8eb20d40636f94a111bce1a14c15cc07ef19691ec6ef5013f221a7337a979371

See more details on using hashes here.

File details

Details for the file torchsnapshot_nightly-2022.6.15-py3-none-any.whl.

File metadata

File hashes

Hashes for torchsnapshot_nightly-2022.6.15-py3-none-any.whl
Algorithm Hash digest
SHA256 3b1f68cdfdb8bd455e065dccb3117e73bb7a44df4a441d35bb19ae9d83249291
MD5 a76e7a9bd888f02c2f165c500097088f
BLAKE2b-256 260bb87e6e4359f4b2e71501aa649e458b1d80c17e05f641e8b6956571fece6e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page