Skip to main content

Internal S3 client implementation for s3torchconnector

Project description

Amazon S3 Connector for PyTorch

The Amazon S3 Connector for PyTorch delivers high throughput for PyTorch training jobs that access or store data in Amazon S3. Using the S3 Connector for PyTorch automatically optimizes performance when downloading training data from and writing checkpoints to Amazon S3, eliminating the need to write your own code to list S3 buckets and manage concurrent requests.

Amazon S3 Connector for PyTorch provides implementations of PyTorch's dataset primitives that you can use to load training data from Amazon S3. It supports both map-style datasets for random data access patterns and iterable-style datasets for streaming sequential data access patterns. The S3 Connector for PyTorch also includes a checkpointing interface to save and load checkpoints directly to Amazon S3, without first saving to local storage.

Getting Started

Prerequisites

  • Python 3.8 or greater is installed (Note: Using 3.12+ is not recommended as PyTorch does not support).
  • PyTorch >= 2.0 (TODO: Check with PyTorch 1.x)

Installation

pip install s3torchconnector

Amazon S3 Connector for PyTorch supports only Linux via Pip for now. For other platforms, see DEVELOPMENT for build instructions.

Configuration

To use s3torchconnector, AWS credentials must be provided through one of the following methods:

  • If you are using this library on an EC2 instance, specify an IAM role and then give the EC2 instance access to that role.
  • Install and configure awscli and run aws configure.
  • Set credentials in the AWS credentials profile file on the local system, located at: ~/.aws/credentials on Unix or macOS.
  • Set the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables.

Examples

API docs are showing API of the public components. End to end example of how to use s3torchconnector can be found under the examples directory.

Sample Examples

The simplest way to use the S3 Connector for PyTorch is to construct a dataset, either a map-style or iterable-style dataset, by specifying an S3 URI (a bucket and optional prefix) and the region the bucket is located in:

from s3torchconnector import S3MapDataset, S3IterableDataset

# You need to update <BUCKET> and <PREFIX>
DATASET_URI="s3://<BUCKET>/<PREFIX>"
REGION = "us-east-1"

iterable_dataset = S3IterableDataset.from_prefix(DATASET_URI, region=REGION)

# Datasets are also iterators. 
for item in iterable_dataset:
  print(item.key)

# S3MapDataset eagerly lists all the objects under the given prefix 
# to provide support of random access.  
# S3MapDataset builds a list of all objects at the first access to its elements or 
# at the first call to get the number of elements, whichever happens first.
# This process might take some time and may give the impression of being unresponsive.
map_dataset = S3MapDataset.from_prefix(DATASET_URI, region=REGION)

# Randomly access to an item in map_dataset.
item = map_dataset[0]

# Learn about bucket, key, and content of the object
bucket = item.bucket
key = item.key
content = item.read()
len(content)

In addition to data loading primitives, the S3 Connector for PyTorch also provides an interface for saving and loading model checkpoints directly to and from an S3 bucket.

from s3torchconnector import S3Checkpoint

import torchvision
import torch

CHECKPOINT_URI="s3://<BUCKET>/<KEY>/"
REGION = "us-east-1"
checkpoint = S3Checkpoint(region=REGION)

model = torchvision.models.resnet18()

# Save checkpoint to S3
with checkpoint.writer(CHECKPOINT_URI + "epoch0.ckpt") as writer:
    torch.save(model.state_dict(), writer)

# Load checkpoint from S3
with checkpoint.reader(CHECKPOINT_URI + "epoch0.ckpt") as reader:
    state_dict = torch.load(reader)

model.load_state_dict(state_dict)

Using datasets or checkpoints with Amazon S3 Express One Zone directory buckets requires only to update the URI, following base-name--azid--x-s3 bucket name format. For example, assuming the following directory bucket name my-test-bucket--usw2-az1--x-s3 with the Availability Zone ID usw2-az1, then the URI used will look like: s3://my-test-bucket--usw2-az1--x-s3/<PREFIX> (please note that the prefix for Amazon S3 Express One Zone should end with '/'), paired with region us-west-2.

Lightning Integration

Amazon S3 Connector for PyTorch includes an integration for PyTorch Lightning, featuring S3LightningCheckpoint, an implementation of Lightning's CheckpointIO. This allows users to make use of Amazon S3 Connector for PyTorch's S3 checkpointing functionality with Pytorch Lightning.

Getting Started

Installation

pip install s3torchconnector[lightning]

Examples

End to end examples for the Pytorch Lightning integration can be found in the examples/lightning directory.

from lightning import Trainer
from s3torchconnector.lightning import S3LightningCheckpoint

...

s3_checkpoint_io = S3LightningCheckpoint("us-east-1")
trainer = Trainer(
    plugins=[s3_checkpoint_io],
    default_root_dir="s3://bucket_name/key_prefix/"
)
trainer.fit(model)

Using S3 Versioning to Manage Checkpoints

When working with model checkpoints, you can use the S3 Versioning feature to preserve, retrieve, and restore every version of your checkpoint objects. With versioning, you can recover more easily from unintended overwrites or deletions of existing checkpoint files due to incorrect configuration or multiple hosts accessing the same storage path.

When versioning is enabled on an S3 bucket, deletions insert a delete marker instead of removing the object permanently. The delete marker becomes the current object version. If you overwrite an object, it results in a new object version in the bucket. You can always restore the previous version. See Deleting object versions from a versioning-enabled bucket for more details on managing object versions.

To enable versioning on an S3 bucket, see Enabling versioning on buckets. Normal Amazon S3 rates apply for every version of an object stored and transferred. To customize your data retention approach and control storage costs for earlier versions of objects, use object versioning with S3 Lifecycle.

S3 Versioning and S3 Lifecycle are not supported by S3 Express One Zone.

Contributing

We welcome contributions to Amazon S3 Connector for PyTorch. Please see CONTRIBUTING for more information on how to report bugs or submit pull requests.

Development

See DEVELOPMENT for information about code style, development process, and guidelines.

Compatibility with other storage services

S3 Connector for PyTorch delivers high throughput for PyTorch training jobs that access or store data in Amazon S3. While it may be functional against other storage services that use S3-like APIs, they may inadvertently break when we make changes to better support Amazon S3. We welcome contributions of minor compatibility fixes or performance improvements for these services if the changes can be tested against Amazon S3.

Security issue notifications

If you discover a potential security issue in this project we ask that you notify AWS Security via our vulnerability reporting page.

Code of conduct

This project has adopted the Amazon Open Source Code of Conduct. See CODE_OF_CONDUCT.md for more details.

License

Amazon S3 Connector for PyTorch has a BSD 3-Clause License, as found in the LICENSE file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

s3torchconnectorclient-1.2.7.tar.gz (57.5 kB view details)

Uploaded Source

Built Distributions

s3torchconnectorclient-1.2.7-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.9 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.17+ x86-64

s3torchconnectorclient-1.2.7-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (3.7 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.17+ ARM64

s3torchconnectorclient-1.2.7-cp312-cp312-macosx_11_0_arm64.whl (2.1 MB view details)

Uploaded CPython 3.12 macOS 11.0+ ARM64

s3torchconnectorclient-1.2.7-cp312-cp312-macosx_10_13_x86_64.whl (2.2 MB view details)

Uploaded CPython 3.12 macOS 10.13+ x86-64

s3torchconnectorclient-1.2.7-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.9 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

s3torchconnectorclient-1.2.7-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (3.7 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ ARM64

s3torchconnectorclient-1.2.7-cp311-cp311-macosx_11_0_arm64.whl (2.1 MB view details)

Uploaded CPython 3.11 macOS 11.0+ ARM64

s3torchconnectorclient-1.2.7-cp311-cp311-macosx_10_9_x86_64.whl (2.2 MB view details)

Uploaded CPython 3.11 macOS 10.9+ x86-64

s3torchconnectorclient-1.2.7-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.9 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

s3torchconnectorclient-1.2.7-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (3.7 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ARM64

s3torchconnectorclient-1.2.7-cp310-cp310-macosx_11_0_arm64.whl (2.1 MB view details)

Uploaded CPython 3.10 macOS 11.0+ ARM64

s3torchconnectorclient-1.2.7-cp310-cp310-macosx_10_9_x86_64.whl (2.2 MB view details)

Uploaded CPython 3.10 macOS 10.9+ x86-64

s3torchconnectorclient-1.2.7-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.9 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

s3torchconnectorclient-1.2.7-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (3.7 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ ARM64

s3torchconnectorclient-1.2.7-cp39-cp39-macosx_11_0_arm64.whl (2.1 MB view details)

Uploaded CPython 3.9 macOS 11.0+ ARM64

s3torchconnectorclient-1.2.7-cp39-cp39-macosx_10_9_x86_64.whl (2.2 MB view details)

Uploaded CPython 3.9 macOS 10.9+ x86-64

s3torchconnectorclient-1.2.7-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.9 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

s3torchconnectorclient-1.2.7-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (3.7 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ ARM64

s3torchconnectorclient-1.2.7-cp38-cp38-macosx_11_0_arm64.whl (2.1 MB view details)

Uploaded CPython 3.8 macOS 11.0+ ARM64

s3torchconnectorclient-1.2.7-cp38-cp38-macosx_10_9_x86_64.whl (2.2 MB view details)

Uploaded CPython 3.8 macOS 10.9+ x86-64

File details

Details for the file s3torchconnectorclient-1.2.7.tar.gz.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.7.tar.gz
Algorithm Hash digest
SHA256 8a957706b22a8be862d17cf5c6973163a98721ef4d5608258a600a0e1be2bf5b
MD5 b0ac40fd75db80ca8f4d0cfd565d198f
BLAKE2b-256 4bf0b8439eac3ab060ccea9da21ae52f891e380d80ef458aac82285ecda2b56d

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.7-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.7-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 d30a940a807d0de4376d58a000ff80f43454b15afa549665869afb39bd2ec102
MD5 207035f2d9fa0f5cb07499e5c694dfcb
BLAKE2b-256 ff5e3b257ba2db5d95757779fe6e1b63da520b7a6b9ed55b4290a26aac6bb316

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.7-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.7-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 42fee2d43e55d3d6a9d9003eabfd1ce7ab27fcb3d1eb72445d49a87963053979
MD5 c8426b0a0cdecaf36a49c24a6300735e
BLAKE2b-256 d5155140d63eaff47fcb24caa86f13ef952ab3061fdbf504bd0618fedba343fa

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.7-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.7-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 0f8e292f845b368d9df07c0c93902cbd1c58ca8beab218c8905cf383058276e0
MD5 2e06bce5a67848da713b5d8753b6df58
BLAKE2b-256 dfa7d3b893fd049b4aa25f3fe97a231e567ec1c105bdb17c4ac98b096e44711a

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.7-cp312-cp312-macosx_10_13_x86_64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.7-cp312-cp312-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 95ce3d8c5a769c5b4d06595e14e13d0d223e1d5a4a0930b043039daf2a0db31f
MD5 e44a9b411a2ca133df27f0925e5726e7
BLAKE2b-256 13a3dfcaf10d9bc984c2b088e2e3876a0170f1c4eebeb930882bc893e72ec7da

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.7-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.7-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 c60c7f730e60e0fe896a0f6c9bd51415358a2473c24471c858aafefd53831fce
MD5 db24ebf434164cd3cac7842fb42ada1a
BLAKE2b-256 04ceb41b72e4dafcb49c49aa50bf284c49e78e7030f0e2cef79de672507ac112

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.7-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.7-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 1fab6d2f9c8dc79555d4bd8aaf47404260390472f03c8f81b4dc920cee830c13
MD5 ecd668862caf522326045d146247c77e
BLAKE2b-256 4ca16734a70d192aa4915eb61fa2019954a7ce0674de6d6aaf6fef80aa0e4c9b

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.7-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.7-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 f1d78b3bcd518bb00bf8088d473a8f58d21b8f648c42f4e67b65a47144cc524d
MD5 ee357a1de696a29ce170737637742020
BLAKE2b-256 1c4cea08c076864bb9d7eda67ee536a98f270975f9ed6f2e582f849f952ab9c4

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.7-cp311-cp311-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.7-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 39a948223d2e9729702eae517c34b16ac801a317cd7ed5bbaf98d824f1fe6bea
MD5 16f532397f635f3d1c3c15fe1a216284
BLAKE2b-256 938e14723cfea80cef743fb82f21e02fd54ef324c123c4fe8f94f9528104ec6e

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.7-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.7-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 0e158247cbc02b06f5abff16aa759e1a65202670feda88ff3f5506badd2b1eaf
MD5 f23271b94dd3058d47c43b4395f30de6
BLAKE2b-256 1520c188aec50d3a340356f1ad7435ef6ed14133eacd56a9263e6b8bc9e70d59

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.7-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.7-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 6d9146226d6ecd5050cef1ca61665b5b51f05db359f7950e7495685bf81bd606
MD5 0cec54b682719760c45998e5e889f347
BLAKE2b-256 db4458224df6e13f349dd25caca58dd665c93eda6a14813b0ebf0cb25a1c5920

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.7-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.7-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 8e897db3f4619787a3f0dbf9ec01343004f1aca5cfbdc5a3d717690406a82179
MD5 fa3b67940d1cdf52abcbd492d186faee
BLAKE2b-256 1acdfd553b073e1ca4a1836eb8a4d4c1f8f24a663ad3a872ee2d17f1cef21102

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.7-cp310-cp310-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.7-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 44f3bba16472da9eeb1ac384ee362d61b5343387a20a1bcc48a0d17bfa3f3ca1
MD5 e000c82693da2354e883fe2c5526d0cd
BLAKE2b-256 69b397f38a4ae877171ca629f5945706cddb32e0693f986ebba2e80b3d1905f0

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.7-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.7-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 585177744df1187b018a618d48ac71cddcd8b8bec769502b53bbc8b2a821add2
MD5 ee6e3a145334b3e246b8031068a0e307
BLAKE2b-256 1439269c01998804a92ff5e4084a88f1200aab1ce48d33e42148123dc38893ec

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.7-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.7-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 1d3453674e32145e8d3311b1c0f8083ebdcda7dc02022e4988cbfcd210f7b965
MD5 6d97ac89c9dac8ffd339572fcf6a8aab
BLAKE2b-256 f2b3e83654cfd52fad53d72a430f7cae15e3823d404a90506ccf84ac1861c0f8

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.7-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.7-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 b96204a14ab670e3d1bc1980e7452bea90064a8b640de9d1fbc0a7e50f7d9d38
MD5 c81b562c383081706e263feb7b79ef26
BLAKE2b-256 b2818cf4d093f0a08c40840d7dba9543a023416bff4db554f57d521cec8dc2de

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.7-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.7-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 723b12e8958050a023919ebb9336da4657c9239c2b189235b702d291dfbd1419
MD5 f12a82dfcb22d544cd6f9eb9b038839d
BLAKE2b-256 1888fc0515ae3210a604ac0adf0bc6a6af3c5e9ca478dadb20c6e0c4ff2024ca

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.7-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.7-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 45782cafd1318137b5c5d4a6159521cc45808090f58170928eda2bd50d5853b3
MD5 f79e542f2c4ced4160d9aae9bf368830
BLAKE2b-256 a7a8dae2f49ccd2884e050d920032dbb0909d228f02fd72d52088536a73b4d72

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.7-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.7-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 581fff24771edd57d38093f91aabbe1bb84d10254c4ffc3c118f8e1dc8107e25
MD5 9fa33e94a1bcc7a7414ed5f993334a4e
BLAKE2b-256 d32b81b13a763729e16439fc1aec0b558d3e77236153467f0816c276b048454e

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.7-cp38-cp38-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.7-cp38-cp38-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 9ed30b9586fdce6773f36b1c42bd53bdfa3a6fe0d9c771f0fce18e210e292dc8
MD5 82ea5c8a5cdb7d94eda87245acb47f05
BLAKE2b-256 fe7a1c588918a31a4cb965b1a31d83afbdb14874b56882392ce432b01a4657c6

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.7-cp38-cp38-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.7-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 1ef6f036e373c7ca37884f555b92083d80b9b7456ca0b27fbabf35e5779ef67c
MD5 154b4ae80e297e05ee742ce1c90f2b6b
BLAKE2b-256 a332f09dbffa2a6078e9cb33de5784419d25b2c2f2661f0625b2c983d4352d5a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page