Skip to main content

Internal S3 client implementation for s3torchconnector

Project description

Amazon S3 Connector for PyTorch

The Amazon S3 Connector for PyTorch delivers high throughput for PyTorch training jobs that access or store data in Amazon S3. Using the S3 Connector for PyTorch automatically optimizes performance when downloading training data from and writing checkpoints to Amazon S3, eliminating the need to write your own code to list S3 buckets and manage concurrent requests.

Amazon S3 Connector for PyTorch provides implementations of PyTorch's dataset primitives that you can use to load training data from Amazon S3. It supports both map-style datasets for random data access patterns and iterable-style datasets for streaming sequential data access patterns. The S3 Connector for PyTorch also includes a checkpointing interface to save and load checkpoints directly to Amazon S3, without first saving to local storage.

Getting Started

Prerequisites

  • Python 3.8 or greater is installed (Note: Using 3.12+ is not recommended as PyTorch does not support).
  • PyTorch >= 2.0 (TODO: Check with PyTorch 1.x)

Installation

pip install s3torchconnector

Amazon S3 Connector for PyTorch supports only Linux via Pip for now. For other platforms, see DEVELOPMENT for build instructions.

Configuration

To use s3torchconnector, AWS credentials must be provided through one of the following methods:

  • If you are using this library on an EC2 instance, specify an IAM role and then give the EC2 instance access to that role.
  • Install and configure awscli and run aws configure.
  • Set credentials in the AWS credentials profile file on the local system, located at: ~/.aws/credentials on Unix or macOS.
  • Set the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables.

Examples

API docs are showing API of the public components. End to end example of how to use s3torchconnector can be found under the examples directory.

Sample Examples

The simplest way to use the S3 Connector for PyTorch is to construct a dataset, either a map-style or iterable-style dataset, by specifying an S3 URI (a bucket and optional prefix) and the region the bucket is located in:

from s3torchconnector import S3MapDataset, S3IterableDataset

# You need to update <BUCKET> and <PREFIX>
DATASET_URI="s3://<BUCKET>/<PREFIX>"
REGION = "us-east-1"

iterable_dataset = S3IterableDataset.from_prefix(DATASET_URI, region=REGION)

# Datasets are also iterators. 
for item in iterable_dataset:
  print(item.key)

# S3MapDataset eagerly lists all the objects under the given prefix 
# to provide support of random access.  
# S3MapDataset builds a list of all objects at the first access to its elements or 
# at the first call to get the number of elements, whichever happens first.
# This process might take some time and may give the impression of being unresponsive.
map_dataset = S3MapDataset.from_prefix(DATASET_URI, region=REGION)

# Randomly access to an item in map_dataset.
item = map_dataset[0]

# Learn about bucket, key, and content of the object
bucket = item.bucket
key = item.key
content = item.read()
len(content)

In addition to data loading primitives, the S3 Connector for PyTorch also provides an interface for saving and loading model checkpoints directly to and from an S3 bucket.

from s3torchconnector import S3Checkpoint

import torchvision
import torch

CHECKPOINT_URI="s3://<BUCKET>/<KEY>/"
REGION = "us-east-1"
checkpoint = S3Checkpoint(region=REGION)

model = torchvision.models.resnet18()

# Save checkpoint to S3
with checkpoint.writer(CHECKPOINT_URI + "epoch0.ckpt") as writer:
    torch.save(model.state_dict(), writer)

# Load checkpoint from S3
with checkpoint.reader(CHECKPOINT_URI + "epoch0.ckpt") as reader:
    state_dict = torch.load(reader)

model.load_state_dict(state_dict)

Using datasets or checkpoints with Amazon S3 Express One Zone directory buckets requires only to update the URI, following base-name--azid--x-s3 bucket name format. For example, assuming the following directory bucket name my-test-bucket--usw2-az1--x-s3 with the Availability Zone ID usw2-az1, then the URI used will look like: s3://my-test-bucket--usw2-az1--x-s3/<PREFIX> (please note that the prefix for Amazon S3 Express One Zone should end with '/'), paired with region us-west-2.

Lightning Integration

Amazon S3 Connector for PyTorch includes an integration for PyTorch Lightning, featuring S3LightningCheckpoint, an implementation of Lightning's CheckpointIO. This allows users to make use of Amazon S3 Connector for PyTorch's S3 checkpointing functionality with Pytorch Lightning.

Getting Started

Installation

pip install s3torchconnector[lightning]

Examples

End to end examples for the Pytorch Lightning integration can be found in the examples/lightning directory

from lightning import Trainer
from s3torchconnector.lightning import S3LightningCheckpoint

...

s3_checkpoint_io = S3LightningCheckpoint("us-east-1")
trainer = Trainer(
    plugins=[s3_checkpoint_io],
    default_root_dir="s3://bucket_name/key_prefix/"
)
trainer.fit(model)

Using S3 Versioning to Manage Checkpoints

When working with model checkpoints, you can use the S3 Versioning feature to preserve, retrieve, and restore every version of your checkpoint objects. With versioning, you can recover more easily from unintended overwrites or deletions of existing checkpoint files due to incorrect configuration or multiple hosts accessing the same storage path.

When versioning is enabled on an S3 bucket, deletions insert a delete marker instead of removing the object permanently. The delete marker becomes the current object version. If you overwrite an object, it results in a new object version in the bucket. You can always restore the previous version. See Deleting object versions from a versioning-enabled bucket for more details on managing object versions.

To enable versioning on an S3 bucket, see Enabling versioning on buckets. Normal Amazon S3 rates apply for every version of an object stored and transferred. To customize your data retention approach and control storage costs for earlier versions of objects, use object versioning with S3 Lifecycle.

S3 Versioning and S3 Lifecycle are not supported by S3 Express One Zone.

Contributing

We welcome contributions to Amazon S3 Connector for PyTorch. Please see CONTRIBUTING For more information on how to report bugs or submit pull requests.

Development

See DEVELOPMENT for information about code style, development process, and guidelines.

Compatibility with other storage services

S3 Connector for PyTorch delivers high throughput for PyTorch training jobs that access or store data in Amazon S3. While it may be functional against other storage services that use S3-like APIs, they may inadvertently break when we make changes to better support Amazon S3. We welcome contributions of minor compatibility fixes or performance improvements for these services if the changes can be tested against Amazon S3.

Security issue notifications

If you discover a potential security issue in this project we ask that you notify AWS Security via our vulnerability reporting page.

Code of conduct

This project has adopted the Amazon Open Source Code of Conduct. See CODE_OF_CONDUCT.md for more details.

License

Amazon S3 Connector for PyTorch has a BSD 3-Clause License, as found in the LICENSE file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

s3torchconnectorclient-1.2.5.tar.gz (57.4 kB view details)

Uploaded Source

Built Distributions

s3torchconnectorclient-1.2.5-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.8 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.17+ x86-64

s3torchconnectorclient-1.2.5-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (3.7 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.17+ ARM64

s3torchconnectorclient-1.2.5-cp312-cp312-macosx_11_0_arm64.whl (2.0 MB view details)

Uploaded CPython 3.12 macOS 11.0+ ARM64

s3torchconnectorclient-1.2.5-cp312-cp312-macosx_10_9_x86_64.whl (2.2 MB view details)

Uploaded CPython 3.12 macOS 10.9+ x86-64

s3torchconnectorclient-1.2.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.8 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

s3torchconnectorclient-1.2.5-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (3.7 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ ARM64

s3torchconnectorclient-1.2.5-cp311-cp311-macosx_11_0_arm64.whl (2.0 MB view details)

Uploaded CPython 3.11 macOS 11.0+ ARM64

s3torchconnectorclient-1.2.5-cp311-cp311-macosx_10_9_x86_64.whl (2.2 MB view details)

Uploaded CPython 3.11 macOS 10.9+ x86-64

s3torchconnectorclient-1.2.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.8 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

s3torchconnectorclient-1.2.5-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (3.7 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ARM64

s3torchconnectorclient-1.2.5-cp310-cp310-macosx_11_0_arm64.whl (2.0 MB view details)

Uploaded CPython 3.10 macOS 11.0+ ARM64

s3torchconnectorclient-1.2.5-cp310-cp310-macosx_10_9_x86_64.whl (2.2 MB view details)

Uploaded CPython 3.10 macOS 10.9+ x86-64

s3torchconnectorclient-1.2.5-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.8 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

s3torchconnectorclient-1.2.5-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (3.7 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ ARM64

s3torchconnectorclient-1.2.5-cp39-cp39-macosx_11_0_arm64.whl (2.0 MB view details)

Uploaded CPython 3.9 macOS 11.0+ ARM64

s3torchconnectorclient-1.2.5-cp39-cp39-macosx_10_9_x86_64.whl (2.2 MB view details)

Uploaded CPython 3.9 macOS 10.9+ x86-64

s3torchconnectorclient-1.2.5-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.8 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

s3torchconnectorclient-1.2.5-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (3.7 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ ARM64

s3torchconnectorclient-1.2.5-cp38-cp38-macosx_11_0_arm64.whl (2.1 MB view details)

Uploaded CPython 3.8 macOS 11.0+ ARM64

s3torchconnectorclient-1.2.5-cp38-cp38-macosx_10_9_x86_64.whl (2.2 MB view details)

Uploaded CPython 3.8 macOS 10.9+ x86-64

File details

Details for the file s3torchconnectorclient-1.2.5.tar.gz.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.5.tar.gz
Algorithm Hash digest
SHA256 ff51d6120b746f6e34bf6f5138a8400c14654ed0243c6a1eb56c1b98cec2f256
MD5 d05e1e7388ce69c0d973cae4919e66b3
BLAKE2b-256 98374ac7c1201270c0380d0265c71a1c37d9cce82658f3e2ea3398b29def14a3

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.5-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.5-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 9e7091df4f495c9c543c8481f370b40bb960ba571bba19aab5ee0b29006f346a
MD5 2ff7ebfec5f02178fa7d478e93b21c8c
BLAKE2b-256 9d643ad1721125238719d4d6d7316425d7aa313381acfecd259cc774cd9ac702

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.5-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.5-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 ab2d394cc3170793b97c3e64989a7a35d0f1526da2a7725bde97b78b72a24ef5
MD5 a76110aa47ac23070af46e948283b218
BLAKE2b-256 c661741ca27bda9a73c37857ad4d8cb463b11caac707b44d7922629b1458e799

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.5-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.5-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 797d08018e11ec14f294769459effdfeb81c6ab35e57ac17e1e8f87e93998e8b
MD5 80fc4bfb82cd4203c360d635187d905f
BLAKE2b-256 583c3d0074f53f0f6498bde75f1a1d4b01b8dcf553b6577e9d2f310a61bc442b

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.5-cp312-cp312-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.5-cp312-cp312-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 4d6d218bd2b9a855d417a6fb1c149dd164094f597552231d6aaf74371adc3a50
MD5 cb73a947a7cf25a0d0e1164bdd8d32ff
BLAKE2b-256 237931ccdc62e60bc86d0c38f46352086434bdf92cd288e2a272e00f6fd88365

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 deb1af82b26c89b69e7c257d4d8b98c8713dd48951d22c3b38a6b1a446e2cd51
MD5 19fceee69c5ca76a653a6b3e62a8ecd0
BLAKE2b-256 18c426cc569c88b7da9e3a0fff52e664a16de865d450e490b63a5b0c754500ec

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.5-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.5-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 14aded47be798102f0488d54992a2832b4046d18306940bb1241b5757ed6d4bb
MD5 946d63ffb5d90808e412ae855cb0febb
BLAKE2b-256 51a3a2499d17e63aaf54c754e8009187100561216397bf2a15e4a9140e31616f

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.5-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.5-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 46c835a295adcdbbad4f599dd95c310856c9b273c08b02f2febd6c822adb027a
MD5 e76d6aad3ccb7a28172a1adaf7019af3
BLAKE2b-256 b3b35edc1ea74bb4b820370799dc055f8b33b53e6086bea5e7501f953df8d23d

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.5-cp311-cp311-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.5-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 e41d4d9bf564e5165b8c1694623012cd7e5a458dc75dd71014903ab886918e43
MD5 13f90d19b39ee977bf9040b49034c4ef
BLAKE2b-256 b44b276d1f311de29a7964795fea0cee58df5611434d205ec44f8af6ef904906

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 47f3fcc79aeab66e1ce16dc5af635b0ca1d733b3d760456f66854efdc7fe9d30
MD5 63e8a649d428873113c2e2058d1376e8
BLAKE2b-256 9d2a347098b8cb4158edd2a38bd09196a682e927f0e7351e97a03169bed5f091

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.5-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.5-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 8244bc3147657d660b6722979bb2c5bbe371437caf9d80396cf618c1d17ca177
MD5 7aaee8f6c010f5fbc9c06960e7a7965d
BLAKE2b-256 f0f5f02e1e26d08d86630c04dcc3420e95746ca381185e68a9671273d51d6389

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.5-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.5-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 f5f430571f3dc57203678407b9152fd5b08194aa72ced634d3adaf7bed7ff1d8
MD5 df5e0b66c11044a0f3e0aa46d1dbfeea
BLAKE2b-256 7be52eb3898a3e27f1b14da09a949094f10edaca9840059c0279db622a77ed18

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.5-cp310-cp310-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.5-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 fee23e951bd690306e25ee879e6d1d421b10d1d5d3777ecf6021db8b00106940
MD5 ef5e6f60209d243d88af4228dfcd5128
BLAKE2b-256 5740b45cdc4dd8f4a266068affe2a58c48fb7df66e6bbdd6319b4a0d1179ad72

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.5-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.5-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 19a5124ad1ec408de8d1d50c52c19868b1251c03e20cd74a03cb65dc99dbaba0
MD5 21304d686b7982ec2f95ff9ffa5dabe7
BLAKE2b-256 6545ef4770f7b65f65aa0e42dd08b1b46b1ee1fd710d590aa6c05f8f242617f5

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.5-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.5-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 7811d9fd42fd147450c48e237680f5d7c07985d1a4748f47263ceaebd576eff5
MD5 4f4444909d9ddf8cd77fcae83c8f593a
BLAKE2b-256 7036b445c4b4ba8c1d8971eca2a39cdca8a80475303e91cb00f6016f6851464b

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.5-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.5-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 dda2a336cf67a0a738d2e3a15d0a4b5ff5fd63760ecf0504a6903fe11022756c
MD5 60775945836eb6e43c42e2a56b8069b4
BLAKE2b-256 aa2e26517944aa14415930f3a05a23e9c4d242c7ad4f63978e0bb864500dee16

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.5-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.5-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 51510d14f91f639072145d3649b7de7a3e85c0ee463c803a66de79fdf642d98b
MD5 5063d5e2fc9a540a2a7bb9e12ae28abf
BLAKE2b-256 c4211642eb2e7cb36c257f50c9df57afb777070386af390ae6703a3885d12692

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.5-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.5-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 1fec956c07884f539b5e9a2f7372ae72d8862ea4bdbe4c8b00dfd293ca55e8b0
MD5 fe080d591f6ed5d99571945b156e82d6
BLAKE2b-256 97cdd09e2ad12a7153df9d432070d6c7acd1cc2b36a92e2135453136dbf2a0e1

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.5-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.5-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 7d14e93d1892a05b1ce3af36bbdba6942ac719ca82d68cd13da2045b1c29c833
MD5 34433f1dabfb6cb224dea60822aece58
BLAKE2b-256 a40925557f3bfe41a9884a03c0ed141524f4184b9c311721af3ed02617f5a907

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.5-cp38-cp38-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.5-cp38-cp38-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 6983ab008621cedbbb954a27e16596c2d80f014b71c757f8dd8103e48c80259e
MD5 317de3933b1d2bbcf988e1bf25a530ee
BLAKE2b-256 cb6a577a493e3a6dd2fa7129a8118a075c5f78d2af93cbeb07128afdd0467f61

See more details on using hashes here.

File details

Details for the file s3torchconnectorclient-1.2.5-cp38-cp38-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for s3torchconnectorclient-1.2.5-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 9db88b8a16ef31e7aa6bc6f3059e218d62092270d7578396722ef2aa7ee26608
MD5 eb8deac4d9c59c3ab1bcee53f6f4a9e2
BLAKE2b-256 45c64b2203f2af33f0a92be9320fd82506ed4e941f11037ca1a0737987e49376

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page