Skip to main content

Internal S3 client implementation for s3torchconnector

Project description

Amazon S3 Connector for PyTorch

The Amazon S3 Connector for PyTorch delivers high throughput for PyTorch training jobs that access or store data in Amazon S3. Using the S3 Connector for PyTorch automatically optimizes performance when downloading training data from and writing checkpoints to Amazon S3, eliminating the need to write your own code to list S3 buckets and manage concurrent requests.

Amazon S3 Connector for PyTorch provides implementations of PyTorch's dataset primitives that you can use to load training data from Amazon S3. It supports both map-style datasets for random data access patterns and iterable-style datasets for streaming sequential data access patterns. The S3 Connector for PyTorch also includes a checkpointing interface to save and load checkpoints directly to Amazon S3, without first saving to local storage.

Getting Started

Prerequisites

  • Python 3.8 or greater is installed (Note: Using 3.12+ is not recommended as PyTorch does not support).
  • PyTorch >= 2.0 (TODO: Check with PyTorch 1.x)

Installation

pip install s3torchconnector

Amazon S3 Connector for PyTorch supports only Linux via Pip for now. For other platforms, see DEVELOPMENT for build instructions.

Configuration

To use s3torchconnector, AWS credentials must be provided through one of the following methods:

  • If you are using this library on an EC2 instance, specify an IAM role and then give the EC2 instance access to that role.
  • Install and configure awscli and run aws configure.
  • Set credentials in the AWS credentials profile file on the local system, located at: ~/.aws/credentials on Unix or macOS.
  • Set the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables.

Examples

API docs are showing API of the public components. End to end example of how to use s3torchconnector can be found under the examples directory.

Sample Examples

The simplest way to use the S3 Connector for PyTorch is to construct a dataset, either a map-style or iterable-style dataset, by specifying an S3 URI (a bucket and optional prefix) and the region the bucket is located in:

from s3torchconnector import S3MapDataset, S3IterableDataset

# You need to update <BUCKET> and <PREFIX>
DATASET_URI="s3://<BUCKET>/<PREFIX>"
REGION = "us-east-1"

iterable_dataset = S3IterableDataset.from_prefix(DATASET_URI, region=REGION)

# Datasets are also iterators. 
for item in iterable_dataset:
  print(item.key)

# S3MapDataset eagerly lists all the objects under the given prefix 
# to provide support of random access.  
# S3MapDataset builds a list of all objects at the first access to its elements or 
# at the first call to get the number of elements, whichever happens first.
# This process might take some time and may give the impression of being unresponsive.
map_dataset = S3MapDataset.from_prefix(DATASET_URI, region=REGION)

# Randomly access to an item in map_dataset.
item = map_dataset[0]

# Learn about bucket, key, and content of the object
bucket = item.bucket
key = item.key
content = item.read()
len(content)

In addition to data loading primitives, the S3 Connector for PyTorch also provides an interface for saving and loading model checkpoints directly to and from an S3 bucket.

from s3torchconnector import S3Checkpoint

import torchvision
import torch

CHECKPOINT_URI="s3://<BUCKET>/<KEY>/"
REGION = "us-east-1"
checkpoint = S3Checkpoint(region=REGION)

model = torchvision.models.resnet18()

# Save checkpoint to S3
with checkpoint.writer(CHECKPOINT_URI + "epoch0.ckpt") as writer:
    torch.save(model.state_dict(), writer)

# Load checkpoint from S3
with checkpoint.reader(CHECKPOINT_URI + "epoch0.ckpt") as reader:
    state_dict = torch.load(reader)

model.load_state_dict(state_dict)

Using datasets or checkpoints with Amazon S3 Express One Zone directory buckets requires only to update the URI, following base-name--azid--x-s3 bucket name format. For example, assuming the following directory bucket name my-test-bucket--usw2-az1--x-s3 with the Availability Zone ID usw2-az1, then the URI used will look like: s3://my-test-bucket--usw2-az1--x-s3/<PREFIX> (please note that the prefix for Amazon S3 Express One Zone should end with '/'), paired with region us-west-2.

Lightning Integration

Amazon S3 Connector for PyTorch includes an integration for PyTorch Lightning, featuring S3LightningCheckpoint, an implementation of Lightning's CheckpointIO. This allows users to make use of Amazon S3 Connector for PyTorch's S3 checkpointing functionality with Pytorch Lightning.

Getting Started

Installation

pip install s3torchconnector[lightning]

Examples

End to end examples for the Pytorch Lightning integration can be found in the examples/lightning directory

from lightning import Trainer
from s3torchconnector.lightning import S3LightningCheckpoint

...

s3_checkpoint_io = S3LightningCheckpoint("us-east-1")
trainer = Trainer(
    plugins=[s3_checkpoint_io],
    default_root_dir="s3://bucket_name/key_prefix/"
)
trainer.fit(model)

Using S3 Versioning to Manage Checkpoints

When working with model checkpoints, you can use the S3 Versioning feature to preserve, retrieve, and restore every version of your checkpoint objects. With versioning, you can recover more easily from unintended overwrites or deletions of existing checkpoint files due to incorrect configuration or multiple hosts accessing the same storage path.

When versioning is enabled on an S3 bucket, deletions insert a delete marker instead of removing the object permanently. The delete marker becomes the current object version. If you overwrite an object, it results in a new object version in the bucket. You can always restore the previous version. See Deleting object versions from a versioning-enabled bucket for more details on managing object versions.

To enable versioning on an S3 bucket, see Enabling versioning on buckets. Normal Amazon S3 rates apply for every version of an object stored and transferred. To customize your data retention approach and control storage costs for earlier versions of objects, use object versioning with S3 Lifecycle.

S3 Versioning and S3 Lifecycle are not supported by S3 Express One Zone.

Contributing

We welcome contributions to Amazon S3 Connector for PyTorch. Please see CONTRIBUTING For more information on how to report bugs or submit pull requests.

Development

See DEVELOPMENT for information about code style, development process, and guidelines.

Compatibility with other storage services

S3 Connector for PyTorch delivers high throughput for PyTorch training jobs that access or store data in Amazon S3. While it may be functional against other storage services that use S3-like APIs, they may inadvertently break when we make changes to better support Amazon S3. We welcome contributions of minor compatibility fixes or performance improvements for these services if the changes can be tested against Amazon S3.

Security issue notifications

If you discover a potential security issue in this project we ask that you notify AWS Security via our vulnerability reporting page.

Code of conduct

This project has adopted the Amazon Open Source Code of Conduct. See CODE_OF_CONDUCT.md for more details.

License

Amazon S3 Connector for PyTorch has a BSD 3-Clause License, as found in the LICENSE file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

s3torchconnectorclient-1.2.6.tar.gz (57.4 kB view details)

Uploaded Source

Built Distributions

s3torchconnectorclient-1.2.6-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.8 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.17+ x86-64

s3torchconnectorclient-1.2.6-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (3.7 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.17+ ARM64

s3torchconnectorclient-1.2.6-cp312-cp312-macosx_11_0_arm64.whl (2.1 MB view details)

Uploaded CPython 3.12 macOS 11.0+ ARM64

s3torchconnectorclient-1.2.6-cp312-cp312-macosx_10_13_x86_64.whl (2.2 MB view details)

Uploaded CPython 3.12 macOS 10.13+ x86-64

s3torchconnectorclient-1.2.6-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.8 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

s3torchconnectorclient-1.2.6-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (3.7 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ ARM64

s3torchconnectorclient-1.2.6-cp311-cp311-macosx_11_0_arm64.whl (2.1 MB view details)

Uploaded CPython 3.11 macOS 11.0+ ARM64

s3torchconnectorclient-1.2.6-cp311-cp311-macosx_10_9_x86_64.whl (2.2 MB view details)

Uploaded CPython 3.11 macOS 10.9+ x86-64

s3torchconnectorclient-1.2.6-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.8 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

s3torchconnectorclient-1.2.6-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (3.7 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ARM64

s3torchconnectorclient-1.2.6-cp310-cp310-macosx_11_0_arm64.whl (2.1 MB view details)

Uploaded CPython 3.10 macOS 11.0+ ARM64

s3torchconnectorclient-1.2.6-cp310-cp310-macosx_10_9_x86_64.whl (2.2 MB view details)

Uploaded CPython 3.10 macOS 10.9+ x86-64

s3torchconnectorclient-1.2.6-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.8 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

s3torchconnectorclient-1.2.6-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (3.7 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ ARM64

s3torchconnectorclient-1.2.6-cp39-cp39-macosx_11_0_arm64.whl (2.1 MB view details)

Uploaded CPython 3.9 macOS 11.0+ ARM64

s3torchconnectorclient-1.2.6-cp39-cp39-macosx_10_9_x86_64.whl (2.2 MB view details)

Uploaded CPython 3.9 macOS 10.9+ x86-64

s3torchconnectorclient-1.2.6-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.8 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

s3torchconnectorclient-1.2.6-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (3.7 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ ARM64

s3torchconnectorclient-1.2.6-cp38-cp38-macosx_11_0_arm64.whl (2.1 MB view details)

Uploaded CPython 3.8 macOS 11.0+ ARM64

s3torchconnectorclient-1.2.6-cp38-cp38-macosx_10_9_x86_64.whl (2.2 MB view details)

Uploaded CPython 3.8 macOS 10.9+ x86-64

File details

Details for the file s3torchconnectorclient-1.2.6.tar.gz.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.6.tar.gz
Algorithm Hash digest
SHA256 b4a2e473cb171965167f641c1d6230f6cb8f8d4e6485e0812665bb7dd75bf148
MD5 683d898cad83b045fe6fe45f1b9bca35
BLAKE2b-256 82e1ab05fbe5dd98169d8cc5ca4fb38bf533daf7dbb902386fed8ab4f0b2cc28

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.6-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.6-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 0619a0344d209827f29f8c0da904ca8e34dc5e09f2048852f0a403591d03da9b
MD5 1c1abe60af1e49bb0fec3bbed526b942
BLAKE2b-256 769afa941e6cb7dfee324b72ee9ae811f5f11c336e35aab70e3ede57eed635eb

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.6-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.6-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 527c469764d4ae427924a8417ec278ada211c8b439cb5548be5edd1965381000
MD5 3ed62667c3be0cece45b87c559221731
BLAKE2b-256 80b40acb8c780dc9159aeed550f0234c7aa23028c52326151f7cc0c2d9911dcc

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.6-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.6-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 6ed46687aedcd91e6558279bb9c08c07b1b022083e6bfe04813d993ddb242810
MD5 7ba0b8fd0b0528df00ddf674153b4d88
BLAKE2b-256 c299affbc7bd6bdbbb63bfb3fd81ab4890b5a41371854f4c653d7237eda0ef87

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.6-cp312-cp312-macosx_10_13_x86_64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.6-cp312-cp312-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 09fcfa1ac0787f1e10b4ffcc56bd3cb3b86a66904a1f758eb7f0b8f816bb8720
MD5 3e2eaf2f372097cd5ff651dbae5447a0
BLAKE2b-256 02c7c61251a70d8a7964d3399b519141732f96f4a40406090e503d4e6c1736c6

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.6-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.6-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 571616a8b55837de364a8c76311924b30737c85e7e74e69903482d672150387e
MD5 c4442779d5b60b3fd585d4a2a61386e5
BLAKE2b-256 3c2369b8a89358a4f99f5b5b68c63648e9803d87ffc2da761da9fa7d711a4aa8

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.6-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.6-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 fa41c43a7a978a177e8ecf1b273399896a6901b71e71f3e357f12b6aa989c010
MD5 bd481a2bb59aa685613bbd83512ba5dd
BLAKE2b-256 a71af813c95c553626a52c68f67c1729f38e702fd2cc84b8b632511805457828

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.6-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.6-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 8ff187a71b86343241785e869e7d471650f5563bdc6dbdc335b45d0ddbd8b3cd
MD5 d241efb500de8774c9e298ef7c1c782b
BLAKE2b-256 1cbc88b1c0c91509e677243e9b22ee7cc7e17f24c5a22bf3c9fb9ad6d3fba57a

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.6-cp311-cp311-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.6-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 50257ac3494898535ef33115a4987f5fb727a5f88bc0967cc1b78d184123b2b4
MD5 f0abaf5532e4899d57801235ef5302e0
BLAKE2b-256 7143f39e4117dabb9ea57a972e6cf6ac4977f60135f7c075fdbfb0a560bd3f3a

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.6-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.6-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 d1e1146767b8ca737aed7bb3a72e6ef9ad3f5ac05992fce3b2cf6352a86c9fb3
MD5 ad3298af11c5a98e972888165feec628
BLAKE2b-256 0b5b73c5fe5e4bbdf5299b8d37fb074623bfe989c455d2d9866b79609585a9ba

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.6-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.6-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 346cf8d449deeddcfb90e087babb5877039869675c59cf91787e76c386655a23
MD5 00ead426c82765a1308b9216fb799f48
BLAKE2b-256 956052e1ce7741fd5de53f0397ef9ca4d97887ddc072ebeb5558a9ad53f9b84f

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.6-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.6-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 3a88be506a893e53d20869b65e2b69b54e2d00524ee52a5c96f9cd6027a478d7
MD5 46de1fd000bf1eb0d7c9d8ef3b221f9f
BLAKE2b-256 897bc6e75bdb6032daa113f280bb5333c0a35d48217cc73f5058b2af077641f0

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.6-cp310-cp310-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.6-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 89da0d9e63d63385629c158a18d44ef108a7267b709fb0e9ac82e82144f16a22
MD5 fd5b1d0e17ef3ec8e60b4609a468f1e4
BLAKE2b-256 03f39c5a523250be6ef311f0045bb02b8d73b5bfd17e8aba4dc1fd571d9cedcc

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.6-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.6-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 3ffc20b3c8bb686afe28134431174f4973361a58fc27fd961fa3d21f9fbf703c
MD5 1c18860de9ef8976a1d07c836a710415
BLAKE2b-256 0ba7bda5e21c7ed534108201584b270fdfdc5290e929b00784c43a69a9626bdc

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.6-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.6-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 d13e3f373dc69ff6417455cfa245eb6f376f3bde7e010818d7159d271a8b34d4
MD5 dd7213bb9a9d930d4692efe029d7e3f0
BLAKE2b-256 410a2d4282c4867fe408518eb57f079cd80e513fd89988186fc3309c25ce27fa

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.6-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.6-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 3575fbd3b7c9fa8812c82f5a39d9f31e6aae7466c618820d7a853ce5e256a8b3
MD5 b422fa5a4d3263fb3152b1b7b62a96ea
BLAKE2b-256 e2b836a8b594866f197e9987da57157abfb3ab68e5c0d594b0b143c780439463

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.6-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.6-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 8e14a66971df1d63e1afd8be82e8c4e2b8749623aca0d9e58678649e75622d6a
MD5 48a970e8014fec262cc95841cdc77efa
BLAKE2b-256 2a8102167a90e60101ea54e8a11424f6913c2736ac82705cadab747bff2a8502

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.6-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.6-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 e909a45ca99ef90dca565f0340a4a229fc2bbf1882b8802c7e74fbf90b5c7e5d
MD5 d34b9e1b6abf589999fccf8b3e30ac6f
BLAKE2b-256 8055520c72232cd80b5cc71d09061c17589928b167174a027a027dca05eab1d7

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.6-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.6-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 0bdfcd9a321d9bbc2b590a31aff00abc7d214d88fbc159436412e3df221ed848
MD5 52c04d23e9602bed434f5d71b4cf63ff
BLAKE2b-256 8a40b393aa8b950ab21089b55543d2a50d7cfec50cfb909bcbb1bda8c89d9889

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.6-cp38-cp38-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.6-cp38-cp38-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 bb6652215bb2572c25bb204193cff2a7129c3d8c1703a09c5f808fdab8b1fd89
MD5 3669c5b76145c06cfce96f189617284b
BLAKE2b-256 0ddeb7d1c3955edc43ec3266cc690070bb7e15ad613db404235b408272eb9958

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.6-cp38-cp38-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.6-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 39bc663611e9e34e9bfd54c777a9c09d5e5e3f35dd7722b57b187448c4b3ccc8
MD5 5f179be0e54dec9f7af0533a25bde9ab
BLAKE2b-256 3a8348db3a1a7c56ad72733eb9cc4887b766b92ca37a655b3280fa00028fa0b2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page