Skip to main content

Amazon Sagemaker specific TensorFlow extensions.

Project description

SageMaker specific extensions to TensorFlow, for Python 2.7, 3.4-3.6 and TensorFlow versions 1.7-1.11. This package includes the PipeModeDataset class, that allows SageMaker Pipe Mode channels to be read using TensorFlow Datasets.

Install

You can build SageMaker TensorFlow into a docker image with the following command:

pip install sagemaker-tensorflow

You can also install sagemaker-tensorflow for a specific version of TensorFlow. The following command will install sagemaker-tensorflow for TensorFlow 1.7:

pip install "sagemaker-tensorflow>=1.7,<1.8"

Build and install from source

The SageMaker TensorFlow build depends on the following:

  • cmake

  • tensorflow

  • curl-dev

To install these run:

pip install cmake tensorflow

On Amazon Linux, curl-dev can be installed with:

yum install curl-dev

On Ubuntu, curl-dev can be installed with:

apt-get install libcurl4-openssl-dev

To build and install this package, run:

pip install .

in this directory.

To build in a SageMaker docker image, you can use the following RUN command in your Dockerfile:

RUN git clone https://github.com/aws/sagemaker-tensorflow-extensions.git && \
    cd sagemaker-tensorflow-extensions && \
    pip install . && \
    cd .. && \
    rm -rf sagemaker-tensorflow-extensions

Building for a specific TensorFlow version

Release branching is used to track different versions of TensorFlow. To build for a specific release of TensorFlow, checkout the release branch prior to running a pip install. For example, to build for TensorFlow 1.7, you can run the following command in your Dockerfile:

RUN git clone https://github.com/aws/sagemaker-tensorflow-extensions.git && \
    cd sagemaker-tensorflow-extensions && \
    git checkout 1.7 && \
    pip install . && \
    cd .. && \
    rm -rf sagemaker-tensorflow-extensions

Requirements

SageMaker TensorFlow extensions builds on Python 2.7, 3.4-3.6 in Linux with a TensorFlow version >= 1.7. Older versions of TensorFlow are not supported. Please make sure to checkout the branch of sagemaker-tensorflow-extensions that matches your TensorFlow version.

SageMaker Pipe Mode

SageMaker Pipe Mode is a mechanism for providing S3 data to a training job via Linux fifos. Training programs can read from the fifo and get high-throughput data transfer from S3, without managing the S3 access in the program itself.

SageMaker Pipe Mode is enabled when a SageMaker Training Job is created. Multiple S3 datasets can be mapped to individual fifos, configured in the training request. Pipe Mode is covered in more detail in the SageMaker documentation: https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-training-algo.html#your-algorithms-training-algo-running-container-inputdataconfig

Using the PipeModeDataset

The PipeModeDataset is a TensorFlow Dataset for reading SageMaker Pipe Mode channels. After installing the sagemaker tensorflow extensions package, the PipeModeDataset can be imported from a moduled named sagemaker_tensorflow.

To construct a PipeModeDataset that reads TFRecord encoded records from a “training” channel, do the following:

from sagemaker_tensorflow import PipeModeDataset

ds = PipeModeDataset(channel='training', record_format='TFRecord')

A PipeModeDataset should be created for a SageMaker PipeMode channel. Each channel corresponds to a single S3 dataset, configured when the training job is created. You can create multiple PipeModeDataset instances over different channels to read from multiple S3 datasets in the same training program.

A PipeModeDataset can read TFRecord, RecordIO, or text line records, by using the record_format constructor argument. The record_format kwarg can be set to either RecordIO, TFRecord, or TextLine to differentiate between the three encodings. RecordIO is the default.

A PipeModeDataset is a regular TensorFlow Dataset and as such can be used in TensorFlow input processing pipelines, and in TensorFlow Estimator input_fn definitions. All Dataset operations are supported on PipeModeDataset. The following code snippet shows how to create a batching and parsing Dataset that reads data from a SageMaker Pipe Mode channel:

features = {
    'data': tf.FixedLenFeature([], tf.string),
    'labels': tf.FixedLenFeature([], tf.int64),
}

def parse(record):
    parsed = tf.parse_single_example(record, features)
    return ({
        'data': tf.decode_raw(parsed['data'], tf.float64)
    }, parsed['labels'])

ds = PipeModeDataset(channel='training', record_format='TFRecord')
num_epochs = 20
ds = ds.repeat(num_epochs)
ds = ds.prefetch(10)
ds = ds.map(parse, num_parallel_calls=10)
ds = ds.batch(64)

Support

We’re here to help. Have a question? Please open a GitHub issue, we’d love to hear from you.

License

SageMaker TensorFlow is licensed under the Apache 2.0 License. It is copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved. The license is available at: http://aws.amazon.com/apache2.0/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file sagemaker_tensorflow-1.12.0.1.0.0.post1-cp36-cp36m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for sagemaker_tensorflow-1.12.0.1.0.0.post1-cp36-cp36m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 c6ac73655fbb50239674420a89c8dd680b7ee2b23ef53e02e0e0c78781f69eb8
MD5 7ce00919b151e06b41c76333aecdb407
BLAKE2b-256 67351a81ffeaa0a52c2f65c3b4e6b02c96776c0d55c94200aa1ab5b92bfb8249

See more details on using hashes here.

File details

Details for the file sagemaker_tensorflow-1.12.0.1.0.0.post1-cp35-cp35m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for sagemaker_tensorflow-1.12.0.1.0.0.post1-cp35-cp35m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 0f42775ca91ec9b721f28bdcbbcf9bd2867898a6579e841219d88534153f70e6
MD5 47e7ef2d031e4e12e6aed149860474f4
BLAKE2b-256 915d2365ebaea7dc20c3b1d5a45eb5ccf53711c7f5e5ad22c90938e9e9d1d6b6

See more details on using hashes here.

File details

Details for the file sagemaker_tensorflow-1.12.0.1.0.0.post1-cp34-cp34m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for sagemaker_tensorflow-1.12.0.1.0.0.post1-cp34-cp34m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 2019e2b28b3d0afb7090ae9f2d3e1639d6170762873dfbc90ffdfcca0e87d925
MD5 a77b56ddd6182286f195c6d00bc68224
BLAKE2b-256 c0b5407b54edaa7c9b49b50bc7f26c16b4883edfec3d1ac876590dfa0be190bd

See more details on using hashes here.

File details

Details for the file sagemaker_tensorflow-1.12.0.1.0.0.post1-cp27-cp27mu-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for sagemaker_tensorflow-1.12.0.1.0.0.post1-cp27-cp27mu-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 c1f1b6b5d946d9fa328ee7a204fc8b60abb978ab99798e163b12115a2afb753b
MD5 8d97b773896623bd353d22b9d9c4202d
BLAKE2b-256 a38b2dd4c498ebc49a00e78da45ca1fd79e2bdb6fb7e7be10952b1b600c79694

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page