Skip to main content

TensorFlow IO

Project description




TensorFlow I/O

GitHub CI PyPI License Documentation

TensorFlow I/O is a collection of file systems and file formats that are not available in TensorFlow's built-in support. A full list of supported file systems and file formats by TensorFlow I/O can be found here.

The use of tensorflow-io is straightforward with keras. Below is an example to Get Started with TensorFlow with the data processing aspect replaced by tensorflow-io:

import tensorflow as tf
import tensorflow_io as tfio

# Read the MNIST data into the IODataset.
dataset_url = "https://storage.googleapis.com/cvdf-datasets/mnist/"
d_train = tfio.IODataset.from_mnist(
    dataset_url + "train-images-idx3-ubyte.gz",
    dataset_url + "train-labels-idx1-ubyte.gz",
)

# Shuffle the elements of the dataset.
d_train = d_train.shuffle(buffer_size=1024)

# By default image data is uint8, so convert to float32 using map().
d_train = d_train.map(lambda x, y: (tf.image.convert_image_dtype(x, tf.float32), y))

# prepare batches the data just like any other tf.data.Dataset
d_train = d_train.batch(32)

# Build the model.
model = tf.keras.models.Sequential(
    [
        tf.keras.layers.Flatten(input_shape=(28, 28)),
        tf.keras.layers.Dense(512, activation=tf.nn.relu),
        tf.keras.layers.Dropout(0.2),
        tf.keras.layers.Dense(10, activation=tf.nn.softmax),
    ]
)

# Compile the model.
model.compile(
    optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"]
)

# Fit the model.
model.fit(d_train, epochs=5, steps_per_epoch=200)

In the above MNIST example, the URL's to access the dataset files are passed directly to the tfio.IODataset.from_mnist API call. This is due to the inherent support that tensorflow-io provides for HTTP/HTTPS file system, thus eliminating the need for downloading and saving datasets on a local directory.

NOTE: Since tensorflow-io is able to detect and uncompress the MNIST dataset automatically if needed, we can pass the URL's for the compressed files (gzip) to the API call as is.

Please check the official documentation for more detailed and interesting usages of the package.

Installation

Python Package

The tensorflow-io Python package can be installed with pip directly using:

$ pip install tensorflow-io

People who are a little more adventurous can also try our nightly binaries:

$ pip install tensorflow-io-nightly

Docker Images

In addition to the pip packages, the docker images can be used to quickly get started.

For stable builds:

$ docker pull tfsigio/tfio:latest
$ docker run -it --rm --name tfio-latest tfsigio/tfio:latest

For nightly builds:

$ docker pull tfsigio/tfio:nightly
$ docker run -it --rm --name tfio-nightly tfsigio/tfio:nightly

R Package

Once the tensorflow-io Python package has been successfully installed, you can install the development version of the R package from GitHub via the following:

if (!require("remotes")) install.packages("remotes")
remotes::install_github("tensorflow/io", subdir = "R-package")

TensorFlow Version Compatibility

To ensure compatibility with TensorFlow, it is recommended to install a matching version of TensorFlow I/O according to the table below. You can find the list of releases here.

TensorFlow I/O Version TensorFlow Compatibility Release Date
0.18.0 2.5.x May 13, 2021
0.17.1 2.4.x Apr 16, 2021
0.17.0 2.4.x Dec 14, 2020
0.16.0 2.3.x Oct 23, 2020
0.15.0 2.3.x Aug 03, 2020
0.14.0 2.2.x Jul 08, 2020
0.13.0 2.2.x May 10, 2020
0.12.0 2.1.x Feb 28, 2020
0.11.0 2.1.x Jan 10, 2020
0.10.0 2.0.x Dec 05, 2019
0.9.1 2.0.x Nov 15, 2019
0.9.0 2.0.x Oct 18, 2019
0.8.1 1.15.x Nov 15, 2019
0.8.0 1.15.x Oct 17, 2019
0.7.2 1.14.x Nov 15, 2019
0.7.1 1.14.x Oct 18, 2019
0.7.0 1.14.x Jul 14, 2019
0.6.0 1.13.x May 29, 2019
0.5.0 1.13.x Apr 12, 2019
0.4.0 1.13.x Mar 01, 2019
0.3.0 1.12.0 Feb 15, 2019
0.2.0 1.12.0 Jan 29, 2019
0.1.0 1.12.0 Dec 16, 2018

Performance Benchmarking

We use github-pages to document the results of API performance benchmarks. The benchmark job is triggered on every commit to master branch and facilitates tracking performance w.r.t commits.

Contributing

Tensorflow I/O is a community led open source project. As such, the project depends on public contributions, bug-fixes, and documentation. Please see:

Build Status and CI

Build Status
Linux CPU Python 2 Status
Linux CPU Python 3 Status
Linux GPU Python 2 Status
Linux GPU Python 3 Status

Because of manylinux2010 requirement, TensorFlow I/O is built with Ubuntu:16.04 + Developer Toolset 7 (GCC 7.3) on Linux. Configuration with Ubuntu 16.04 with Developer Toolset 7 is not exactly straightforward. If the system have docker installed, then the following command will automatically build manylinux2010 compatible whl package:

#!/usr/bin/env bash

ls dist/*
for f in dist/*.whl; do
  docker run -i --rm -v $PWD:/v -w /v --net=host quay.io/pypa/manylinux2010_x86_64 bash -x -e /v/tools/build/auditwheel repair --plat manylinux2010_x86_64 $f
done
sudo chown -R $(id -nu):$(id -ng) .
ls wheelhouse/*

It takes some time to build, but once complete, there will be python 3.5, 3.6, 3.7 compatible whl packages available in wheelhouse directory.

On macOS, the same command could be used. However, the script expects python in shell and will only generate a whl package that matches the version of python in shell. If you want to build a whl package for a specific python then you have to alias this version of python to python in shell. See .github/workflows/build.yml Auditwheel step for instructions how to do that.

Note the above command is also the command we use when releasing packages for Linux and macOS.

TensorFlow I/O uses both GitHub Workflows and Google CI (Kokoro) for continuous integration. GitHub Workflows is used for macOS build and test. Kokoro is used for Linux build and test. Again, because of the manylinux2010 requirement, on Linux whl packages are always built with Ubuntu 16.04 + Developer Toolset 7. Tests are done on a variatiy of systems with different python3 versions to ensure a good coverage:

Python Ubuntu 18.04 Ubuntu 20.04 macOS + osx9 Windows-2019
2.7 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: N/A
3.7 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:
3.8 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:

TensorFlow I/O has integrations with many systems and cloud vendors such as Prometheus, Apache Kafka, Apache Ignite, Google Cloud PubSub, AWS Kinesis, Microsoft Azure Storage, Alibaba Cloud OSS etc.

We tried our best to test against those systems in our continuous integration whenever possible. Some tests such as Prometheus, Kafka, and Ignite are done with live systems, meaning we install Prometheus/Kafka/Ignite on CI machine before the test is run. Some tests such as Kinesis, PubSub, and Azure Storage are done through official or non-official emulators. Offline tests are also performed whenever possible, though systems covered through offine tests may not have the same level of coverage as live systems or emulators.

Live System Emulator CI Integration Offline
Apache Kafka :heavy_check_mark: :heavy_check_mark:
Apache Ignite :heavy_check_mark: :heavy_check_mark:
Prometheus :heavy_check_mark: :heavy_check_mark:
Google PubSub :heavy_check_mark: :heavy_check_mark:
Azure Storage :heavy_check_mark: :heavy_check_mark:
AWS Kinesis :heavy_check_mark: :heavy_check_mark:
Alibaba Cloud OSS :heavy_check_mark:
Google BigTable/BigQuery to be added
Elasticsearch (experimental) :heavy_check_mark: :heavy_check_mark:
MongoDB (experimental) :heavy_check_mark: :heavy_check_mark:

References for emulators:

Community

Additional Information

License

Apache License 2.0

Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519200847-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519200847-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 c2a01d40b844428bfababc4f9ed9f1f2e63eec6ff812b4165b752e3a11e4c36d
MD5 4db1b414f674f58c583b0713e90c7502
BLAKE2b-256 784751a56ef251396ec4e0b5d08a6ece76de7a64d6dadd33a12bb8f46471a5e7

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519200847-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519200847-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 749ce00c6495a14cd3e77fb45627174380530ad208452ce041bf34932a6321e3
MD5 927128bc5e119e36b669a01a7c179531
BLAKE2b-256 fbbb7865fd89e412c100c65ad67d82fc161a36beed32958794060d413e875ff4

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519200847-cp39-cp39-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519200847-cp39-cp39-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 1416e4cc41fbb987b00e4d63eb17cc9439adc6a799dc237b4aa3fe9ba0cc97f5
MD5 b58ca00c095f5e8cad4c841136442b79
BLAKE2b-256 25680efb8bd81cf20bc8d8750daa289b7769602ec315944f97b92952f7e55931

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519200847-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519200847-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 e87fd793aa5fc79e03ca03987a77ee0a04ccbfa3135ea02264132b85d345850c
MD5 dbcbeaa60a9c59bd4270b96c8989fd3c
BLAKE2b-256 807d5559f94009391022ecde32638cff5da3d20cd7334c4f270a479b0e54a26d

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519200847-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519200847-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 91b95ba55a874b1d695fb276e72f031a7dfebc979ab19114be96d14d88e2c16c
MD5 5ad51d4431357992627b984f5fa68891
BLAKE2b-256 85a930c8fe9b5c95b5007d52460aa8b0f906826199ebd388fedf54b200dd8109

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519200847-cp38-cp38-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519200847-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 544535690c54c369c183e436a729cfac09dd0c430395b5745489a8ba3ee932f1
MD5 4277b61d54ba705d0d6292271d6229e9
BLAKE2b-256 3a44670d9ebca4389cb7b707f2df18a8940ddf6b09414e8a01850b0c0a036870

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519200847-cp37-cp37m-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519200847-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 7a1595e82976ff2fd4e55743d5d3a6d453aa1865c9234cedba49ef3e135e29e8
MD5 e41b98e4a0a49502c45ba9c3c7ea0488
BLAKE2b-256 372daf0095b15ff12a39d7ce9ca631359796548b724416760ed477b9a921559f

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519200847-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519200847-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 083bad2d597c1bdae94f021c2abd5684165055d1be69039205da31b6b9b7e3bb
MD5 99620fa734ba9933157143e6ea7e0d78
BLAKE2b-256 b8d2f433a5bc819cca74a64b8fab582f9e37f386038f1e171e76be69c156c566

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519200847-cp37-cp37m-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519200847-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 423a0e17a73195178ffcc4a884fae93955c926448b25c47bbe9bd70890914379
MD5 f9d12af0828520ddff2c525056274f26
BLAKE2b-256 9207b5e9c0d4d2d74abd9be28b9eae5fe049fe72da47857023db6a5a565d7ed3

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519200847-cp36-cp36m-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519200847-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 ef6bbb5bd3c9ae4fa6b908069aabd92cc34b04069d3742abe263b3e1cdf56534
MD5 c12f915daee3543c190f5d66db6875af
BLAKE2b-256 adc687acc51249e162c398ab4f964250d5139e9bf75f9bbeeb1df7140b6b0b97

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519200847-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519200847-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 3c070e2104e40d21b6a5ef21f762a4fb93b56b7493e1a2726cf6e17587bb6244
MD5 ca08d2defdd3cbdbc1d005d01441772a
BLAKE2b-256 889d45c4ad4b729636d98345f5fa438a36f269f5295153ad71d56f2ea4525748

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519200847-cp36-cp36m-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519200847-cp36-cp36m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 0535656ce4b2cf98fc7d163b9be0a570396615e544cc572b0ca11a2d72592b48
MD5 b6e2d4300cf50d80b71f617e3698f990
BLAKE2b-256 28dd929a5bc8d8e55ee321ce43e4606458d042be832ed7bb6fb83b1e1eed2ebf

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page