Skip to main content

Internal S3 client implementation for s3torchconnector

Project description

Amazon S3 Connector for PyTorch

The Amazon S3 Connector for PyTorch delivers high throughput for PyTorch training jobs that access or store data in Amazon S3. Using the S3 Connector for PyTorch automatically optimizes performance when downloading training data from and writing checkpoints to Amazon S3, eliminating the need to write your own code to list S3 buckets and manage concurrent requests.

Amazon S3 Connector for PyTorch provides implementations of PyTorch's dataset primitives that you can use to load training data from Amazon S3. It supports both map-style datasets for random data access patterns and iterable-style datasets for streaming sequential data access patterns. The S3 Connector for PyTorch also includes a checkpointing interface to save and load checkpoints directly to Amazon S3, without first saving to local storage.

Getting Started

Prerequisites

  • Python 3.8 or greater is installed (Note: Using 3.12+ is not recommended as PyTorch does not support).
  • PyTorch >= 2.0 (TODO: Check with PyTorch 1.x)

Installation

pip install s3torchconnector

Amazon S3 Connector for PyTorch supports only Linux via Pip for now. For other platforms, see DEVELOPMENT for build instructions.

Configuration

To use s3torchconnector, AWS credentials must be provided through one of the following methods:

  • If you are using this library on an EC2 instance, specify an IAM role and then give the EC2 instance access to that role.
  • Install and configure awscli and run aws configure.
  • Set credentials in the AWS credentials profile file on the local system, located at: ~/.aws/credentials on Unix or macOS.
  • Set the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables.

Examples

API docs are showing API of the public components. End to end example of how to use s3torchconnector can be found under the examples directory.

Sample Examples

The simplest way to use the S3 Connector for PyTorch is to construct a dataset, either a map-style or iterable-style dataset, by specifying an S3 URI (a bucket and optional prefix) and the region the bucket is located in:

from s3torchconnector import S3MapDataset, S3IterableDataset

# You need to update <BUCKET> and <PREFIX>
DATASET_URI="s3://<BUCKET>/<PREFIX>"
REGION = "us-east-1"

iterable_dataset = S3IterableDataset.from_prefix(DATASET_URI, region=REGION)

# Datasets are also iterators. 
for item in iterable_dataset:
  print(item.key)

# S3MapDataset eagerly lists all the objects under the given prefix 
# to provide support of random access.  
# S3MapDataset builds a list of all objects at the first access to its elements or 
# at the first call to get the number of elements, whichever happens first.
# This process might take some time and may give the impression of being unresponsive.
map_dataset = S3MapDataset.from_prefix(DATASET_URI, region=REGION)

# Randomly access to an item in map_dataset.
item = map_dataset[0]

# Learn about bucket, key, and content of the object
bucket = item.bucket
key = item.key
content = item.read()
len(content)

In addition to data loading primitives, the S3 Connector for PyTorch also provides an interface for saving and loading model checkpoints directly to and from an S3 bucket.

from s3torchconnector import S3Checkpoint

import torchvision
import torch

CHECKPOINT_URI="s3://<BUCKET>/<KEY>/"
REGION = "us-east-1"
checkpoint = S3Checkpoint(region=REGION)

model = torchvision.models.resnet18()

# Save checkpoint to S3
with checkpoint.writer(CHECKPOINT_URI + "epoch0.ckpt") as writer:
    torch.save(model.state_dict(), writer)

# Load checkpoint from S3
with checkpoint.reader(CHECKPOINT_URI + "epoch0.ckpt") as reader:
    state_dict = torch.load(reader)

model.load_state_dict(state_dict)

Using datasets or checkpoints with Amazon S3 Express One Zone directory buckets requires only to update the URI, following base-name--azid--x-s3 bucket name format. For example, assuming the following directory bucket name my-test-bucket--usw2-az1--x-s3 with the Availability Zone ID usw2-az1, then the URI used will look like: s3://my-test-bucket--usw2-az1--x-s3/<PREFIX> (please note that the prefix for Amazon S3 Express One Zone should end with '/'), paired with region us-west-2.

Lightning Integration

Amazon S3 Connector for PyTorch includes an integration for PyTorch Lightning, featuring S3LightningCheckpoint, an implementation of Lightning's CheckpointIO. This allows users to make use of Amazon S3 Connector for PyTorch's S3 checkpointing functionality with Pytorch Lightning.

Getting Started

Installation

pip install s3torchconnector[lightning]

Examples

End to end examples for the Pytorch Lightning integration can be found in the examples/lightning directory

from lightning import Trainer
from s3torchconnector.lightning import S3LightningCheckpoint

...

s3_checkpoint_io = S3LightningCheckpoint("us-east-1")
trainer = Trainer(
    plugins=[s3_checkpoint_io],
    default_root_dir="s3://bucket_name/key_prefix/"
)
trainer.fit(model)

Using S3 Versioning to Manage Checkpoints

When working with model checkpoints, you can use the S3 Versioning feature to preserve, retrieve, and restore every version of your checkpoint objects. With versioning, you can recover more easily from unintended overwrites or deletions of existing checkpoint files due to incorrect configuration or multiple hosts accessing the same storage path.

When versioning is enabled on an S3 bucket, deletions insert a delete marker instead of removing the object permanently. The delete marker becomes the current object version. If you overwrite an object, it results in a new object version in the bucket. You can always restore the previous version. See Deleting object versions from a versioning-enabled bucket for more details on managing object versions.

To enable versioning on an S3 bucket, see Enabling versioning on buckets. Normal Amazon S3 rates apply for every version of an object stored and transferred. To customize your data retention approach and control storage costs for earlier versions of objects, use object versioning with S3 Lifecycle.

S3 Versioning and S3 Lifecycle are not supported by S3 Express One Zone.

Contributing

We welcome contributions to Amazon S3 Connector for PyTorch. Please see CONTRIBUTING For more information on how to report bugs or submit pull requests.

Development

See DEVELOPMENT for information about code style, development process, and guidelines.

Compatibility with other storage services

S3 Connector for PyTorch delivers high throughput for PyTorch training jobs that access or store data in Amazon S3. While it may be functional against other storage services that use S3-like APIs, they may inadvertently break when we make changes to better support Amazon S3. We welcome contributions of minor compatibility fixes or performance improvements for these services if the changes can be tested against Amazon S3.

Security issue notifications

If you discover a potential security issue in this project we ask that you notify AWS Security via our vulnerability reporting page.

Code of conduct

This project has adopted the Amazon Open Source Code of Conduct. See CODE_OF_CONDUCT.md for more details.

License

Amazon S3 Connector for PyTorch has a BSD 3-Clause License, as found in the LICENSE file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

s3torchconnectorclient-1.2.4.tar.gz (57.7 kB view details)

Uploaded Source

Built Distributions

s3torchconnectorclient-1.2.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.8 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.17+ x86-64

s3torchconnectorclient-1.2.4-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (3.6 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.17+ ARM64

s3torchconnectorclient-1.2.4-cp312-cp312-macosx_11_0_arm64.whl (2.0 MB view details)

Uploaded CPython 3.12 macOS 11.0+ ARM64

s3torchconnectorclient-1.2.4-cp312-cp312-macosx_10_9_x86_64.whl (2.1 MB view details)

Uploaded CPython 3.12 macOS 10.9+ x86-64

s3torchconnectorclient-1.2.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.8 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

s3torchconnectorclient-1.2.4-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (3.6 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ ARM64

s3torchconnectorclient-1.2.4-cp311-cp311-macosx_11_0_arm64.whl (2.0 MB view details)

Uploaded CPython 3.11 macOS 11.0+ ARM64

s3torchconnectorclient-1.2.4-cp311-cp311-macosx_10_9_x86_64.whl (2.1 MB view details)

Uploaded CPython 3.11 macOS 10.9+ x86-64

s3torchconnectorclient-1.2.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.8 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

s3torchconnectorclient-1.2.4-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (3.6 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ARM64

s3torchconnectorclient-1.2.4-cp310-cp310-macosx_11_0_arm64.whl (2.0 MB view details)

Uploaded CPython 3.10 macOS 11.0+ ARM64

s3torchconnectorclient-1.2.4-cp310-cp310-macosx_10_9_x86_64.whl (2.1 MB view details)

Uploaded CPython 3.10 macOS 10.9+ x86-64

s3torchconnectorclient-1.2.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.8 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

s3torchconnectorclient-1.2.4-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (3.6 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ ARM64

s3torchconnectorclient-1.2.4-cp39-cp39-macosx_11_0_arm64.whl (2.0 MB view details)

Uploaded CPython 3.9 macOS 11.0+ ARM64

s3torchconnectorclient-1.2.4-cp39-cp39-macosx_10_9_x86_64.whl (2.1 MB view details)

Uploaded CPython 3.9 macOS 10.9+ x86-64

s3torchconnectorclient-1.2.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.8 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

s3torchconnectorclient-1.2.4-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (3.6 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ ARM64

s3torchconnectorclient-1.2.4-cp38-cp38-macosx_11_0_arm64.whl (2.0 MB view details)

Uploaded CPython 3.8 macOS 11.0+ ARM64

s3torchconnectorclient-1.2.4-cp38-cp38-macosx_10_9_x86_64.whl (2.1 MB view details)

Uploaded CPython 3.8 macOS 10.9+ x86-64

File details

Details for the file s3torchconnectorclient-1.2.4.tar.gz.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.4.tar.gz
Algorithm Hash digest
SHA256 1c99ae463fee65331acbc748bb0839f1fa7435a7b25088a7cddf15de77100767
MD5 bea36c10146bc015f07f36ee3f4c8eea
BLAKE2b-256 bb8b3dbb849dd605a6c6df35f1351656ffd0f7eb46d1b678b489c177580b4fe6

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 571859f72a080f6d6f47fb9e4735ea034ca82a5eb1cb1620ba5d82645a169cb9
MD5 71be26cf28e50ebd007bfb2217b5968f
BLAKE2b-256 9399ea0a14e716d718710a4bd7090f0c715debc3d7d87c4375ebb54ed1e48f68

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.4-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.4-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 5a6fa8ceebd797b52036956056a310e8d00ebb64f593659c3959410da96740e0
MD5 ab95440529cb1c4f8ce666219028c0a4
BLAKE2b-256 2c6510cd28b75ca7ddbc6d192367684df43177219e277121fe37c0283ddcfffe

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.4-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.4-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 b390d13a8016479e24715f0e153f63d0fcd778f60bcbe25613e66cea7bb1c64c
MD5 fabdeaa4792af37806aa820e65694aa8
BLAKE2b-256 62e9655bdb771cd38dbbecc218e48a544225bb6d16bc6477922b16f682fe0d59

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.4-cp312-cp312-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.4-cp312-cp312-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 802eba2bfce723f7e9a800a65359721a7b4f269d8dda4ad61540cf93e209419d
MD5 e2cd541f48c408a5d17861c86dcf8c3e
BLAKE2b-256 049504634d0b245000f70ce3681b7aa3b5abcb93eeda2e70607b9ad86fabfb3a

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 5faeb6c0fa58f971e96c70667cc6df004390733e743a8e98d979b595d46cb53e
MD5 bbd875cd4adf855d46bcc1728d96dde0
BLAKE2b-256 af8a09d8adf210bb73bea41b8a922f8ffff49413bde26827b850ad4ad4f23d6e

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.4-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.4-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 d21b772699dd895a8bc86e61bdd6085189919377c5db28630a731a2f967a1d64
MD5 d9089dbde04f05d184f183d43485cccb
BLAKE2b-256 89a99c299dfe0c9628af83f97c89c8807bc7f14c0e28a4e955daa84e63277eb1

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.4-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.4-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 9bcaa743e4b350df7ddb7a7ae01d07577e1acf14fddf0d638ca8f6ac3e10409b
MD5 4de55c51705d1eecc6aeae2f070a330b
BLAKE2b-256 c2d6fae8c3796bf192f68e1987ede83b2d55c5191a0cb7406622cfdd187a2381

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.4-cp311-cp311-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.4-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 12448556c041fe1ed031babd9edc639fb756c544d15b8ebc476636d16f5ceb19
MD5 1adb1990f121b8ec4e612999cf687db5
BLAKE2b-256 0204d0cd7e82fdc0de08f65213bb1db4d704dcd4f7588c4020d5c8d92de7c3b9

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 6a5bf8eb87e794e75cc2fc82de2a2fe12c150324d825657a122edf034a03def2
MD5 4f5fa52a347f0b2f6691e2001fa9d7fb
BLAKE2b-256 bf794c97a9aa4b6b5b55b8fc017d094ebaa7e0fba204064fdf27ea46169e0c77

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.4-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.4-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 de3971a7bf8085925b15345085323b2c35a127da4339ce00d9067e64f627169a
MD5 4b83c88658322784d9c1a7966ef35383
BLAKE2b-256 15e3046a13f878e0b8ad6f845d490f09850bf48c8a3d814d87b1a74cf1289b32

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.4-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.4-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 39c51b856f1e0d5f8e6dc87448f1922baa4bbfea344162a58c2cf4890591219c
MD5 0d78e987b116c8f55ccdecd9a7d65790
BLAKE2b-256 71e5711e4b99d86f69faf91197f68174edf9d3b8e1a241f82d3a4a22af2d0696

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.4-cp310-cp310-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.4-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 e9600e74f4b2d1a5330cd16e2342867ae59fef13c351c34c0191f7f55162ba57
MD5 74e1f24a04c8042c10ef76726e337a2e
BLAKE2b-256 b54e629be24b69a21f6eab57e38b3f304a2cd2ffe2027afc32d7b04569f2fc5a

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 5baa74cfa73bab303fedb9fbbff58974c0d850b22836515a79dce3536f9c13b3
MD5 ca43c055393813fe2eef4c03d7589481
BLAKE2b-256 32692e8c784e165d3e3c60cb6bdf44a59cd8c7443043845850bab48a1daf5eca

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.4-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.4-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 235282ffb3c6642209fd0bba77a842e09b1e7d5ed9e01e1bf1c7468f18c3bdcd
MD5 535c32d2230c69a34ee6afd85f7ddf5a
BLAKE2b-256 02b26efaadf3d8e336f26b7e1aaf2d335045fbf6f2b91aea708b53b9e4a74f17

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.4-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.4-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 447f51b3ac0460d4740449e4060f6ccda47449bbbb1eda9a501b6d95b69f637f
MD5 2ff844e5833bf06f9bcdf6f85b27d8bb
BLAKE2b-256 490c554c8840e7e63856699eb5bf34cb2a464ab55a995ac352f038c02b97109c

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.4-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.4-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 0108fb22d80087282a6e3cb2964d86d611c44a48ed809033ace5ba6e96427e8e
MD5 87c340fd9211e80d6f60e8caee7ce0a8
BLAKE2b-256 2302197f856204ced3ea09d997276f4f31fb637baef56b5fa7772b6325523a19

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 46d8ebdafa9aa581ffb7bf6bac3ca7545ca9256dc08a2f105d3841d619972bb7
MD5 285925b87de5c3ea765df2e8f862c823
BLAKE2b-256 9b8031d33471963d99e4e975c181b2dca1a48fff2b9ed07cb84109448f1f080e

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.4-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.4-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 67837b865ea7502b9a8fabb70e22ef767b4de76b477656ba41723842307244fc
MD5 4ced2b53773751ba36e4b85be0a92603
BLAKE2b-256 8513c3fbd3868f0a23f412b26814a5459f09d50095b3e4cce41a10287997c04f

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.4-cp38-cp38-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.4-cp38-cp38-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 ae60a16120e8ae16e70cb157ae92815aff9eae2a9066ea3ff3467a8ce158a9af
MD5 a3bb8ce49f7fe0c34886b64ab7967095
BLAKE2b-256 dfe0b390893ab12e9590bba331d43dfa261c15caca2455556a711f0c117ad4c3

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.4-cp38-cp38-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.4-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 5eae8b6a80073160c6dbf5c051e7cb7a7c3a55010f2485201af6307abb4d8b37
MD5 5307799bc860c489849bda63f849bf50
BLAKE2b-256 2d1e81fc7ab30c3aefe9d3305cc240c81f4ab7a56d48841e95b29fa078abc7c9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page