Amazon Sagemaker specific TensorFlow extensions.
Project description
SageMaker specific extensions to TensorFlow, for Python 2.7, 3.4-3.6 and TensorFlow versions 1.7, 1.8, and 1.9. This package includes the PipeModeDataset
class, that allows SageMaker Pipe Mode channels to be read using TensorFlow DataSets.
Install
You can build SageMaker TensorFlow into your docker images with the following command:
pip install sagemaker-tensorflow
You can also install sagemaker-tensorflow for a specific version of TensorFlow. The following command will install sagemaker-tensorflow for TensorFlow 1.7:
pip install "sagemaker-tensorflow>=1.7,<1.8"
Build from source
SageMaker TensorFlow build requires cmake
to be installed. Please pip install cmake
before building SageMaker TensorFlow.
SageMaker TensorFlow extensions is installed as a python package named sagemaker_tensorflow
.
First, make sure you have cmake installed. If not:
pip install cmake
To install this package, run:
pip install .
in this directory.
To build in a SageMaker docker image, you can use the following RUN command in your Dockerfile:
RUN git clone https://github.com/aws/sagemaker-tensorflow-extensions.git && \ pip install cmake && \ cd sagemaker-tensorflow-extensions && \ pip install . && \ cd .. && \ rm -rf sagemaker-tensorflow-extensions
Building for a specific TensorFlow version
Release branching is used to track different versions of TensorFlow. Tensorflow versions 1.7 and 1.8 are supported. To build for a specific release of TensorFlow, checkout the release branch prior to running a pip install. For example, to build for TensorFlow 1.7, you can run the following command in your Dockerfile:
RUN git clone https://github.com/aws/sagemaker-tensorflow-extensions.git && \ pip install cmake && \ cd sagemaker-tensorflow-extensions && \ git checkout 1.7 && \ pip install . && \ cd .. && \ rm -rf sagemaker-tensorflow-extensions
Requirements
SageMaker TensorFlow extensions builds on Python 2.7 in Linux, with either TensorFlow 1.7, and 1.8. Please make sure to checkout the branch of sagemaker-tensorflow-extensions that matches your TensorFlow version installed.
SageMaker Pipe Mode
SageMaker Pipe Mode is a mechanism for providing S3 data to a training job via Linux fifos. Training programs can read from the fifo and get high-throughput data transfer from S3, without managing the S3 access in the program itself.
SageMaker Pipe Mode is enabled when a SageMaker Training Job is created. Multiple S3 datasets can be mapped to individual fifos, configured in the training request. Pipe Mode is covered in more detail in the SageMaker documentation: https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-training-algo.html#your-algorithms-training-algo-running-container-inputdataconfig
Using the PipeModeDataset
The PipeModeDataset
is a TensorFlow Dataset
for reading SageMaker Pipe Mode channels. After installing the sagemaker tensorflow extensions package, the PipeModeDataset
can be imported from a moduled named sagemaker_tensorflow
.
To construct a PipeModeDataset
that reads TFRecord encoded records from a “training” channel, do the following:
from sagemaker_tensorflow import PipeModeDataset
ds = PipeModeDataset(channel='training', record_format='TFRecord')
A PipeModeDataset
should be created for a SageMaker PipeMode channel. Each channel corresponds to a single S3 dataset, configured when the training job is created. You can create multiple PipeModeDataset
instances over different channels to read from multiple S3 datasets in the same training program.
A PipeModeDataset
can read TFRecord, RecordIO, or text line records, by using the record_format
constructor argument. The record_format
kwarg can be set to either RecordIO
, TFRecord
, or TextLine
to differentiate between the three encodings. RecordIO
is the default.
A PipeModeDataset
is a regular TensorFlow Dataset
and as such can be used in TensorFlow input processing pipelines, and in TensorFlow Estimator input_fn
definitions. All Dataset
operations are supported on PipeModeDataset
. The following code snippet shows how to create a batching and parsing Dataset
that reads data from a SageMaker Pipe Mode channel:
features = {
'data': tf.FixedLenFeature([], tf.string),
'labels': tf.FixedLenFeature([], tf.int64),
}
def parse(record):
parsed = tf.parse_single_example(record, features)
return ({
'data': tf.decode_raw(parsed['data'], tf.float64)
}, parsed['labels'])
ds = PipeModeDataset(channel='training', record_format='TFRecord')
num_epochs = 20
ds = ds.repeat(num_epochs)
ds = ds.prefetch(10)
ds = ds.map(parse, num_parallel_calls=10)
ds = ds.batch(64)
License
SageMaker TensorFlow is licensed under the Apache 2.0 License. It is copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved. The license is available at: http://aws.amazon.com/apache2.0/
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Hashes for sagemaker_tensorflow-1.9.0.1.0.3-cp36-cp36m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4cbeb42f186d2c019d565b2edcb95de7a59870e859aac3b377ff85b0df058508 |
|
MD5 | 4af5b7dc38b3fe0fc40d35af88597410 |
|
BLAKE2b-256 | f4a6c6d91f068c7c6b1984565359a202af6e33adef61d78b1d4a57d87c958d73 |
Hashes for sagemaker_tensorflow-1.9.0.1.0.3-cp35-cp35m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ff2de2acd9212925d4e835d7e1ad8d256a477a3a8932245e3accae619bec2538 |
|
MD5 | 88cef84d8e0928f2219a37d03c42dd40 |
|
BLAKE2b-256 | 94fdd7ac2f22d2c61701dbe7c1b8612ece74c10ccc98d7535752dea6e01df4bb |
Hashes for sagemaker_tensorflow-1.9.0.1.0.3-cp34-cp34m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7297b6e8c3b2579fed595c0d01997d9b7d6db158b322cedf8bf0220dfce336bd |
|
MD5 | 57856287397d3c6c4f24b0fba82fd9d4 |
|
BLAKE2b-256 | 317597206c309d52f804cc947059b2af8dc168178b68a883b7df30e4680b365e |
Hashes for sagemaker_tensorflow-1.9.0.1.0.3-cp27-cp27mu-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 56278757393da4e16ac1f37d536bd1b6908d928f53e7af9a72e1f482e83da0d2 |
|
MD5 | d60c8a3878b106d22d8aec577446ddd9 |
|
BLAKE2b-256 | 60ae8f9e9ee75f9cbfb74a7fa61379855b4baebd0294812fdfc3916db1d22359 |