Skip to main content

Amazon Sagemaker specific TensorFlow extensions.

Project description

SageMaker specific extensions to TensorFlow, for Python 2.7, 3.4-3.6 and TensorFlow versions 1.7, 1.8, and 1.9. This package includes the PipeModeDataset class, that allows SageMaker Pipe Mode channels to be read using TensorFlow DataSets.

Install

You can build SageMaker TensorFlow into your docker images with the following command:

pip install sagemaker-tensorflow

You can also install sagemaker-tensorflow for a specific version of TensorFlow. The following command will install sagemaker-tensorflow for TensorFlow 1.7:

pip install "sagemaker-tensorflow>=1.7,<1.8"

Build from source

SageMaker TensorFlow build requires cmake to be installed. Please pip install cmake before building SageMaker TensorFlow.

SageMaker TensorFlow extensions is installed as a python package named sagemaker_tensorflow.

First, make sure you have cmake installed. If not:

pip install cmake

To install this package, run:

pip install .

in this directory.

To build in a SageMaker docker image, you can use the following RUN command in your Dockerfile:

RUN git clone https://github.com/aws/sagemaker-tensorflow-extensions.git && \
    pip install cmake && \
    cd sagemaker-tensorflow-extensions && \
    pip install . && \
    cd .. && \
    rm -rf sagemaker-tensorflow-extensions

Building for a specific TensorFlow version

Release branching is used to track different versions of TensorFlow. Tensorflow versions 1.7 and 1.8 are supported. To build for a specific release of TensorFlow, checkout the release branch prior to running a pip install. For example, to build for TensorFlow 1.7, you can run the following command in your Dockerfile:

RUN git clone https://github.com/aws/sagemaker-tensorflow-extensions.git && \
    pip install cmake && \
    cd sagemaker-tensorflow-extensions && \
    git checkout 1.7 && \
    pip install . && \
    cd .. && \
    rm -rf sagemaker-tensorflow-extensions

Requirements

SageMaker TensorFlow extensions builds on Python 2.7 in Linux, with either TensorFlow 1.7, and 1.8. Please make sure to checkout the branch of sagemaker-tensorflow-extensions that matches your TensorFlow version installed.

SageMaker Pipe Mode

SageMaker Pipe Mode is a mechanism for providing S3 data to a training job via Linux fifos. Training programs can read from the fifo and get high-throughput data transfer from S3, without managing the S3 access in the program itself.

SageMaker Pipe Mode is enabled when a SageMaker Training Job is created. Multiple S3 datasets can be mapped to individual fifos, configured in the training request. Pipe Mode is covered in more detail in the SageMaker documentation: https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-training-algo.html#your-algorithms-training-algo-running-container-inputdataconfig

Using the PipeModeDataset

The PipeModeDataset is a TensorFlow Dataset for reading SageMaker Pipe Mode channels. After installing the sagemaker tensorflow extensions package, the PipeModeDataset can be imported from a moduled named sagemaker_tensorflow.

To construct a PipeModeDataset that reads TFRecord encoded records from a “training” channel, do the following:

from sagemaker_tensorflow import PipeModeDataset

ds = PipeModeDataset(channel='training', record_format='TFRecord')

A PipeModeDataset should be created for a SageMaker PipeMode channel. Each channel corresponds to a single S3 dataset, configured when the training job is created. You can create multiple PipeModeDataset instances over different channels to read from multiple S3 datasets in the same training program.

A PipeModeDataset can read TFRecord, RecordIO, or text line records, by using the record_format constructor argument. The record_format kwarg can be set to either RecordIO, TFRecord, or TextLine to differentiate between the three encodings. RecordIO is the default.

A PipeModeDataset is a regular TensorFlow Dataset and as such can be used in TensorFlow input processing pipelines, and in TensorFlow Estimator input_fn definitions. All Dataset operations are supported on PipeModeDataset. The following code snippet shows how to create a batching and parsing Dataset that reads data from a SageMaker Pipe Mode channel:

features = {
    'data': tf.FixedLenFeature([], tf.string),
    'labels': tf.FixedLenFeature([], tf.int64),
}

def parse(record):
    parsed = tf.parse_single_example(record, features)
    return ({
        'data': tf.decode_raw(parsed['data'], tf.float64)
    }, parsed['labels'])

ds = PipeModeDataset(channel='training', record_format='TFRecord')
num_epochs = 20
ds = ds.repeat(num_epochs)
ds = ds.prefetch(10)
ds = ds.map(parse, num_parallel_calls=10)
ds = ds.batch(64)

License

SageMaker TensorFlow is licensed under the Apache 2.0 License. It is copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved. The license is available at: http://aws.amazon.com/apache2.0/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file sagemaker_tensorflow-1.9.0.1.0.3-cp36-cp36m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for sagemaker_tensorflow-1.9.0.1.0.3-cp36-cp36m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 4cbeb42f186d2c019d565b2edcb95de7a59870e859aac3b377ff85b0df058508
MD5 4af5b7dc38b3fe0fc40d35af88597410
BLAKE2b-256 f4a6c6d91f068c7c6b1984565359a202af6e33adef61d78b1d4a57d87c958d73

See more details on using hashes here.

File details

Details for the file sagemaker_tensorflow-1.9.0.1.0.3-cp35-cp35m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for sagemaker_tensorflow-1.9.0.1.0.3-cp35-cp35m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 ff2de2acd9212925d4e835d7e1ad8d256a477a3a8932245e3accae619bec2538
MD5 88cef84d8e0928f2219a37d03c42dd40
BLAKE2b-256 94fdd7ac2f22d2c61701dbe7c1b8612ece74c10ccc98d7535752dea6e01df4bb

See more details on using hashes here.

File details

Details for the file sagemaker_tensorflow-1.9.0.1.0.3-cp34-cp34m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for sagemaker_tensorflow-1.9.0.1.0.3-cp34-cp34m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 7297b6e8c3b2579fed595c0d01997d9b7d6db158b322cedf8bf0220dfce336bd
MD5 57856287397d3c6c4f24b0fba82fd9d4
BLAKE2b-256 317597206c309d52f804cc947059b2af8dc168178b68a883b7df30e4680b365e

See more details on using hashes here.

File details

Details for the file sagemaker_tensorflow-1.9.0.1.0.3-cp27-cp27mu-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for sagemaker_tensorflow-1.9.0.1.0.3-cp27-cp27mu-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 56278757393da4e16ac1f37d536bd1b6908d928f53e7af9a72e1f482e83da0d2
MD5 d60c8a3878b106d22d8aec577446ddd9
BLAKE2b-256 60ae8f9e9ee75f9cbfb74a7fa61379855b4baebd0294812fdfc3916db1d22359

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page