Skip to main content

Amazon Sagemaker specific TensorFlow extensions.

Project description

===============================
SageMaker TensorFlow
===============================

.. role:: python(code)
:language: python

SageMaker specific extensions to TensorFlow, for Python 2.7, 3.4-3.6 and TensorFlow versions 1.7-1.11. This package includes the :python:`PipeModeDataset` class, that allows SageMaker Pipe Mode channels to be read using TensorFlow Datasets.

Install
~~~~~~~
You can build SageMaker TensorFlow into a docker image with the following command:

::

pip install sagemaker-tensorflow


You can also install sagemaker-tensorflow for a specific version of TensorFlow. The following command will install sagemaker-tensorflow for TensorFlow 1.7:

::

pip install "sagemaker-tensorflow>=1.7,<1.8"

Build and install from source
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The SageMaker TensorFlow build depends on the following:

* cmake
* tensorflow
* curl-dev

To install these run:

::

pip install cmake tensorflow

On Amazon Linux, curl-dev can be installed with:

::

yum install curl-dev

On Ubuntu, curl-dev can be installed with:

::

apt-get install libcurl4-openssl-dev


To build and install this package, run:

::

pip install .

in this directory.

To build in a SageMaker docker image, you can use the following RUN command in your Dockerfile:

::

RUN git clone https://github.com/aws/sagemaker-tensorflow-extensions.git && \
cd sagemaker-tensorflow-extensions && \
pip install . && \
cd .. && \
rm -rf sagemaker-tensorflow-extensions

Building for a specific TensorFlow version
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Release branching is used to track different versions of TensorFlow. To build for a specific release of TensorFlow, checkout the release branch prior to running a pip install. For example, to build for TensorFlow 1.7, you can run the following command in your Dockerfile:

::

RUN git clone https://github.com/aws/sagemaker-tensorflow-extensions.git && \
cd sagemaker-tensorflow-extensions && \
git checkout 1.7 && \
pip install . && \
cd .. && \
rm -rf sagemaker-tensorflow-extensions

Requirements
~~~~~~~~~~~~
SageMaker TensorFlow extensions builds on Python 2.7, 3.4-3.6 in Linux with a TensorFlow version >= 1.7. Older versions of TensorFlow are not supported. Please make sure to checkout the branch of sagemaker-tensorflow-extensions that matches your TensorFlow version.

SageMaker Pipe Mode
~~~~~~~~~~~~~~~~~~~
SageMaker Pipe Mode is a mechanism for providing S3 data to a training job via Linux fifos. Training programs can read from the fifo and get high-throughput data transfer from S3, without managing the S3 access in the program itself.

SageMaker Pipe Mode is enabled when a SageMaker Training Job is created. Multiple S3 datasets can be mapped to individual fifos, configured in the training request. Pipe Mode is covered in more detail in the SageMaker documentation: https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-training-algo.html#your-algorithms-training-algo-running-container-inputdataconfig

Using the PipeModeDataset
~~~~~~~~~~~~~~~~~~~~~~~~~
The :code:`PipeModeDataset` is a TensorFlow :code:`Dataset` for reading SageMaker Pipe Mode channels. After installing the sagemaker tensorflow extensions package, the :code:`PipeModeDataset` can be imported from a moduled named :code:`sagemaker_tensorflow`.

To construct a :code:`PipeModeDataset` that reads TFRecord encoded records from a "training" channel, do the following:

.. code:: python

from sagemaker_tensorflow import PipeModeDataset

ds = PipeModeDataset(channel='training', record_format='TFRecord')

A :python:`PipeModeDataset` should be created for a SageMaker PipeMode channel. Each channel corresponds to a single S3 dataset, configured when the training job is created. You can create multiple :python:`PipeModeDataset` instances over different channels to read from multiple S3 datasets in the same training program.

A :python:`PipeModeDataset` can read TFRecord, RecordIO, or text line records, by using the :code:`record_format` constructor argument. The :code:`record_format` kwarg can be set to either :code:`RecordIO`, :code:`TFRecord`, or :code:`TextLine` to differentiate between the three encodings. :code:`RecordIO` is the default.

A :python:`PipeModeDataset` is a regular TensorFlow :python:`Dataset` and as such can be used in TensorFlow input processing pipelines, and in TensorFlow Estimator :code:`input_fn` definitions. All :python:`Dataset` operations are supported on :python:`PipeModeDataset`. The following code snippet shows how to create a batching and parsing :python:`Dataset` that reads data from a SageMaker Pipe Mode channel:

.. code:: python

features = {
'data': tf.FixedLenFeature([], tf.string),
'labels': tf.FixedLenFeature([], tf.int64),
}

def parse(record):
parsed = tf.parse_single_example(record, features)
return ({
'data': tf.decode_raw(parsed['data'], tf.float64)
}, parsed['labels'])

ds = PipeModeDataset(channel='training', record_format='TFRecord')
num_epochs = 20
ds = ds.repeat(num_epochs)
ds = ds.prefetch(10)
ds = ds.map(parse, num_parallel_calls=10)
ds = ds.batch(64)

Support
~~~~~~~
We're here to help. Have a question? Please open a `GitHub issue`__, we'd love to hear from you.

.. _X: https://github.com/aws/sagemaker-tensorflow-extensions/issues/new

__ X_
License
~~~~~~~

SageMaker TensorFlow is licensed under the Apache 2.0 License. It is copyright 2018
Amazon.com, Inc. or its affiliates. All Rights Reserved. The license is available at:
http://aws.amazon.com/apache2.0/


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file sagemaker_tensorflow-1.12.0.1.0.0-cp36-cp36m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for sagemaker_tensorflow-1.12.0.1.0.0-cp36-cp36m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 d6b1bf1dae21e99200cd1214012dc650dcddcfa976ae47c7d1e82fd489b8686f
MD5 73f55d057b28b1c3a8d72abab808cce8
BLAKE2b-256 0277f911b7090f60a138402cd613435f4f7a55743d1cf27dce5eef53f983fc3d

See more details on using hashes here.

File details

Details for the file sagemaker_tensorflow-1.12.0.1.0.0-cp35-cp35m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for sagemaker_tensorflow-1.12.0.1.0.0-cp35-cp35m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 5e2493107917a8da663933b4865b49a64fd2a097d53ef5e974030fb23ecf53fd
MD5 e9a419a28ccbb0e4667ad2434af979ee
BLAKE2b-256 f53717992a52a8e0175f9b3cc4eb1391707cde1009d2a8f3591fc295f353e77a

See more details on using hashes here.

File details

Details for the file sagemaker_tensorflow-1.12.0.1.0.0-cp34-cp34m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for sagemaker_tensorflow-1.12.0.1.0.0-cp34-cp34m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 264a53c8f9d361c35b86a9bf47fb5edac36e2ec3cbc0fe4d4f5a8a73a6dfc026
MD5 9b8195d774af8cee4a10ed23b2be5637
BLAKE2b-256 5d53eda4749d7f929d706bc20e286a88642d57054d624735df03c7bfb221fcb8

See more details on using hashes here.

File details

Details for the file sagemaker_tensorflow-1.12.0.1.0.0-cp27-cp27mu-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for sagemaker_tensorflow-1.12.0.1.0.0-cp27-cp27mu-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 daff348594d52e4ea6368eea86ade550eedc0e9f9fa0caa4c3d5f46a8323c255
MD5 7c29e7d65e013495163adeb016b6df10
BLAKE2b-256 f73838ceed9a159672e8f1b4e35fac47974402d49e2ffd9c01150ff11890bfad

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page