Skip to main content

Open source library for using TensorFlow to train models on on Amazon SageMaker.

Project description

The SageMaker TensorFlow Training Toolkit is an open source library for making the TensorFlow framework run on Amazon SageMaker.

This repository also contains Dockerfiles which install this library, TensorFlow, and dependencies for building SageMaker TensorFlow images.

For information on running TensorFlow jobs on SageMaker:

Table of Contents

  1. Getting Started

  2. Building your Image

  3. Running the tests

Getting Started

Prerequisites

Make sure you have installed all of the following prerequisites on your development machine:

For Testing on GPU

Building your Image

Amazon SageMaker utilizes Docker containers to run all training jobs & inference endpoints.

The Docker images are built from the Dockerfiles specified in docker/.

The Dockerfiles are grouped based on TensorFlow version and separated based on Python version and processor type.

The Dockerfiles for TensorFlow 2.0+ are available in the tf-2 branch.

To build the images, first copy the files under docker/build_artifacts/ to the folder container the Dockerfile you wish to build.

# Example for building a TF 2.1 image with Python 3
cp docker/build_artifacts/* docker/2.1.0/py3/.

After that, go to the directory containing the Dockerfile you wish to build, and run docker build to build the image.

# Example for building a TF 2.1 image for CPU with Python 3
cd docker/2.1.0/py3
docker build -t tensorflow-training:2.1.0-cpu-py3 -f Dockerfile.cpu .

Don’t forget the period at the end of the docker build command!

Running the tests

Running the tests requires installation of the SageMaker TensorFlow Training Toolkit code and its test dependencies.

git clone https://github.com/aws/sagemaker-tensorflow-container.git
cd sagemaker-tensorflow-container
pip install -e .[test]

Tests are defined in test/ and include unit, integration and functional tests.

Unit Tests

If you want to run unit tests, then use:

# All test instructions should be run from the top level directory
pytest test/unit

Integration Tests

Running integration tests require Docker and AWS credentials, as the integration tests make calls to a couple AWS services. The integration and functional tests require configurations specified within their respective conftest.py.Make sure to update the account-id and region at a minimum.

Integration tests on GPU require Nvidia-Docker.

Before running integration tests:

  1. Build your Docker image.

  2. Pass in the correct pytest arguments to run tests against your Docker image.

If you want to run local integration tests, then use:

# Required arguments for integration tests are found in test/integ/conftest.py
pytest test/integration --docker-base-name <your_docker_image> \
                        --tag <your_docker_image_tag> \
                        --framework-version <tensorflow_version> \
                        --processor <cpu_or_gpu>
# Example
pytest test/integration --docker-base-name preprod-tensorflow \
                        --tag 1.0 \
                        --framework-version 1.4.1 \
                        --processor cpu

Functional Tests

Functional tests are removed from the current branch, please see them in older branch r1.0.

Contributing

Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests to us.

License

SageMaker TensorFlow Containers is licensed under the Apache 2.0 License. It is copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved. The license is available at: http://aws.amazon.com/apache2.0/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sagemaker_tensorflow_training-20.4.1.tar.gz (13.3 kB view details)

Uploaded Source

File details

Details for the file sagemaker_tensorflow_training-20.4.1.tar.gz.

File metadata

  • Download URL: sagemaker_tensorflow_training-20.4.1.tar.gz
  • Upload date:
  • Size: 13.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.8.3 requests/2.28.1 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.64.0 CPython/3.7.12

File hashes

Hashes for sagemaker_tensorflow_training-20.4.1.tar.gz
Algorithm Hash digest
SHA256 d4c089266bc7e66c128c013901649b1315cd6eba61b11061e1e1bde84b8699e1
MD5 18c626adc574126083d5913dd1019a4a
BLAKE2b-256 23751dcac37fe6c757ed02bf1318e2c86c520aa2e7a9d937b2fe5ca61cc3f80f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page