Skip to main content

TensorFlow IO

Project description




TensorFlow I/O

GitHub CI PyPI License Documentation

TensorFlow I/O is a collection of file systems and file formats that are not available in TensorFlow's built-in support. A full list of supported file systems and file formats by TensorFlow I/O can be found here.

The use of tensorflow-io is straightforward with keras. Below is an example to Get Started with TensorFlow with the data processing aspect replaced by tensorflow-io:

import tensorflow as tf
import tensorflow_io as tfio

# Read the MNIST data into the IODataset.
dataset_url = "https://storage.googleapis.com/cvdf-datasets/mnist/"
d_train = tfio.IODataset.from_mnist(
    dataset_url + "train-images-idx3-ubyte.gz",
    dataset_url + "train-labels-idx1-ubyte.gz",
)

# Shuffle the elements of the dataset.
d_train = d_train.shuffle(buffer_size=1024)

# By default image data is uint8, so convert to float32 using map().
d_train = d_train.map(lambda x, y: (tf.image.convert_image_dtype(x, tf.float32), y))

# prepare batches the data just like any other tf.data.Dataset
d_train = d_train.batch(32)

# Build the model.
model = tf.keras.models.Sequential(
    [
        tf.keras.layers.Flatten(input_shape=(28, 28)),
        tf.keras.layers.Dense(512, activation=tf.nn.relu),
        tf.keras.layers.Dropout(0.2),
        tf.keras.layers.Dense(10, activation=tf.nn.softmax),
    ]
)

# Compile the model.
model.compile(
    optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"]
)

# Fit the model.
model.fit(d_train, epochs=5, steps_per_epoch=200)

In the above MNIST example, the URL's to access the dataset files are passed directly to the tfio.IODataset.from_mnist API call. This is due to the inherent support that tensorflow-io provides for HTTP/HTTPS file system, thus eliminating the need for downloading and saving datasets on a local directory.

NOTE: Since tensorflow-io is able to detect and uncompress the MNIST dataset automatically if needed, we can pass the URL's for the compressed files (gzip) to the API call as is.

Please check the official documentation for more detailed and interesting usages of the package.

Installation

Python Package

The tensorflow-io Python package can be installed with pip directly using:

$ pip install tensorflow-io

People who are a little more adventurous can also try our nightly binaries:

$ pip install tensorflow-io-nightly

Docker Images

In addition to the pip packages, the docker images can be used to quickly get started.

For stable builds:

$ docker pull tfsigio/tfio:latest
$ docker run -it --rm --name tfio-latest tfsigio/tfio:latest

For nightly builds:

$ docker pull tfsigio/tfio:nightly
$ docker run -it --rm --name tfio-nightly tfsigio/tfio:nightly

R Package

Once the tensorflow-io Python package has been successfully installed, you can install the development version of the R package from GitHub via the following:

if (!require("remotes")) install.packages("remotes")
remotes::install_github("tensorflow/io", subdir = "R-package")

TensorFlow Version Compatibility

To ensure compatibility with TensorFlow, it is recommended to install a matching version of TensorFlow I/O according to the table below. You can find the list of releases here.

TensorFlow I/O Version TensorFlow Compatibility Release Date
0.17.1 2.4.x Apr 16, 2021
0.17.0 2.4.x Dec 14, 2020
0.16.0 2.3.x Oct 23, 2020
0.15.0 2.3.x Aug 03, 2020
0.14.0 2.2.x Jul 08, 2020
0.13.0 2.2.x May 10, 2020
0.12.0 2.1.x Feb 28, 2020
0.11.0 2.1.x Jan 10, 2020
0.10.0 2.0.x Dec 05, 2019
0.9.1 2.0.x Nov 15, 2019
0.9.0 2.0.x Oct 18, 2019
0.8.1 1.15.x Nov 15, 2019
0.8.0 1.15.x Oct 17, 2019
0.7.2 1.14.x Nov 15, 2019
0.7.1 1.14.x Oct 18, 2019
0.7.0 1.14.x Jul 14, 2019
0.6.0 1.13.x May 29, 2019
0.5.0 1.13.x Apr 12, 2019
0.4.0 1.13.x Mar 01, 2019
0.3.0 1.12.0 Feb 15, 2019
0.2.0 1.12.0 Jan 29, 2019
0.1.0 1.12.0 Dec 16, 2018

Performance Benchmarking

We use github-pages to document the results of API performance benchmarks. The benchmark job is triggered on every commit to master branch and facilitates tracking performance w.r.t commits.

Contributing

Tensorflow I/O is a community led open source project. As such, the project depends on public contributions, bug-fixes, and documentation. Please see:

Build Status and CI

Build Status
Linux CPU Python 2 Status
Linux CPU Python 3 Status
Linux GPU Python 2 Status
Linux GPU Python 3 Status

Because of manylinux2010 requirement, TensorFlow I/O is built with Ubuntu:16.04 + Developer Toolset 7 (GCC 7.3) on Linux. Configuration with Ubuntu 16.04 with Developer Toolset 7 is not exactly straightforward. If the system have docker installed, then the following command will automatically build manylinux2010 compatible whl package:

#!/usr/bin/env bash

ls dist/*
for f in dist/*.whl; do
  docker run -i --rm -v $PWD:/v -w /v --net=host quay.io/pypa/manylinux2010_x86_64 bash -x -e /v/tools/build/auditwheel repair --plat manylinux2010_x86_64 $f
done
sudo chown -R $(id -nu):$(id -ng) .
ls wheelhouse/*

It takes some time to build, but once complete, there will be python 3.5, 3.6, 3.7 compatible whl packages available in wheelhouse directory.

On macOS, the same command could be used. However, the script expects python in shell and will only generate a whl package that matches the version of python in shell. If you want to build a whl package for a specific python then you have to alias this version of python to python in shell. See .github/workflows/build.yml Auditwheel step for instructions how to do that.

Note the above command is also the command we use when releasing packages for Linux and macOS.

TensorFlow I/O uses both GitHub Workflows and Google CI (Kokoro) for continuous integration. GitHub Workflows is used for macOS build and test. Kokoro is used for Linux build and test. Again, because of the manylinux2010 requirement, on Linux whl packages are always built with Ubuntu 16.04 + Developer Toolset 7. Tests are done on a variatiy of systems with different python3 versions to ensure a good coverage:

Python Ubuntu 18.04 Ubuntu 20.04 macOS + osx9 Windows-2019
2.7 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: N/A
3.7 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:
3.8 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:

TensorFlow I/O has integrations with many systems and cloud vendors such as Prometheus, Apache Kafka, Apache Ignite, Google Cloud PubSub, AWS Kinesis, Microsoft Azure Storage, Alibaba Cloud OSS etc.

We tried our best to test against those systems in our continuous integration whenever possible. Some tests such as Prometheus, Kafka, and Ignite are done with live systems, meaning we install Prometheus/Kafka/Ignite on CI machine before the test is run. Some tests such as Kinesis, PubSub, and Azure Storage are done through official or non-official emulators. Offline tests are also performed whenever possible, though systems covered through offine tests may not have the same level of coverage as live systems or emulators.

Live System Emulator CI Integration Offline
Apache Kafka :heavy_check_mark: :heavy_check_mark:
Apache Ignite :heavy_check_mark: :heavy_check_mark:
Prometheus :heavy_check_mark: :heavy_check_mark:
Google PubSub :heavy_check_mark: :heavy_check_mark:
Azure Storage :heavy_check_mark: :heavy_check_mark:
AWS Kinesis :heavy_check_mark: :heavy_check_mark:
Alibaba Cloud OSS :heavy_check_mark:
Google BigTable/BigQuery to be added
Elasticsearch (experimental) :heavy_check_mark: :heavy_check_mark:
MongoDB (experimental) :heavy_check_mark: :heavy_check_mark:

References for emulators:

Community

Additional Information

License

Apache License 2.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

tensorflow_io_plugin_gs_nightly-0.18.0.dev20210430205854-cp39-cp39-manylinux2010_x86_64.whl (2.5 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.12+ x86-64

tensorflow_io_plugin_gs_nightly-0.18.0.dev20210430205854-cp38-cp38-manylinux2010_x86_64.whl (2.5 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.12+ x86-64

tensorflow_io_plugin_gs_nightly-0.18.0.dev20210430205854-cp37-cp37m-manylinux2010_x86_64.whl (2.5 MB view details)

Uploaded CPython 3.7m manylinux: glibc 2.12+ x86-64

tensorflow_io_plugin_gs_nightly-0.18.0.dev20210430205854-cp36-cp36m-manylinux2010_x86_64.whl (2.5 MB view details)

Uploaded CPython 3.6m manylinux: glibc 2.12+ x86-64

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210430205854-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210430205854-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 3568417aa982685a1bf4c1aebb40fe3cc75d8335f302be3a32acdd5eeac7c3ab
MD5 4f638fedeae78c2b770e57ac9d739abd
BLAKE2b-256 e942c3a3a3d160e89bb232c832bf6379f5b630423c3f5e1730be6c8f5ad3371c

See more details on using hashes here.

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210430205854-cp39-cp39-manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210430205854-cp39-cp39-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 53b7659f7de62e50f92944d81883cff4e525aad04bac3ae10df46871705e68f1
MD5 82c27fc7078cd697912bad4c6a22cee0
BLAKE2b-256 8442b3095495754b6fd5776d79315defffd1cca669990faa09901898dfb19769

See more details on using hashes here.

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210430205854-cp39-cp39-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210430205854-cp39-cp39-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 2cf124a3276927c761040713e4f51b4b15148f9624f60275c115324fd79da1dc
MD5 19398269cacf955b8bb986d7602f4058
BLAKE2b-256 5657ac606eb8ce3a070041e7e8715788632b5b1c0d71000dddc5cd2c38a4c4e2

See more details on using hashes here.

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210430205854-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210430205854-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 bf3870fe818026705fa02e231f46523723a1b261c488bc5ea6fd0b299845aea5
MD5 e4d1f4203b35f2d3547e6980a04f1d71
BLAKE2b-256 2fb863b88b4ede6a830c2bb525ecfe9d87b8fdcbfdac9cc0fa730967b9baf852

See more details on using hashes here.

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210430205854-cp38-cp38-manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210430205854-cp38-cp38-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 49d6474a554343b80471abe93fb7300a9c7836673f607122c1bd5a0a2bc7afee
MD5 3276ac8c99f9dddf03db98b02d9f00a3
BLAKE2b-256 b8e55abf75f06431aa76b7a8dec2d4231bc873a9cfd61623d22c7f56af29ffaa

See more details on using hashes here.

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210430205854-cp38-cp38-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210430205854-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 d67d7f485ff57583fa1ff75a528397675a952257be4189cf3a7ffa404c694dec
MD5 87f3238947bcf24eaa123364f9ca60b1
BLAKE2b-256 0dbfac9dd7bfdad546792cac6a5d4ad2f384969ae897b84414a7aee066ecbf3f

See more details on using hashes here.

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210430205854-cp37-cp37m-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210430205854-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 2b74da0de85b873460da9babc7136191fa4e69415ace6a8bb945f5d67c5f1aee
MD5 cbf7535441c267fb306806dcb96e03e0
BLAKE2b-256 6ff52f17d6366410611282df5f25be105857628dcc7ac667ae8c7bf439623fbd

See more details on using hashes here.

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210430205854-cp37-cp37m-manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210430205854-cp37-cp37m-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 816d69a555da7cd2b88d309e381627e9ffc9e51ee8dc6e74d13c2962b570230b
MD5 8295bae00393bb184b162de2d6ea63d9
BLAKE2b-256 ad7cdc9364a7c8cb2b61abe477ff912b777b0ff41b5f46b8867e32908482dc79

See more details on using hashes here.

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210430205854-cp37-cp37m-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210430205854-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 9ff2c92dfedf6ef902aea3a60ddbbbf58a954462de1196026aff133d9b892cfc
MD5 4a689ae35992349d3d27617d2c5c4e89
BLAKE2b-256 d8d3122476559b59a0f8e5a4f87db3d930d0466d85049d561f4634f91437aa1e

See more details on using hashes here.

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210430205854-cp36-cp36m-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210430205854-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 0b468d1bd6da78f525551e272016b6d755dc147713091c6c2a86e2208ab6ad4e
MD5 476208b17bf4c8360543f28cccabb7a6
BLAKE2b-256 092f8c0f04b4aca35f0eda2a1f849ebaea16a94d724dc09c7027abedab9ca7f3

See more details on using hashes here.

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210430205854-cp36-cp36m-manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210430205854-cp36-cp36m-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 c9d1d785956b988f3cc42b100fcaecee2fa34788e54e0fea9309fab2e242e15b
MD5 8e1a9978ebdd5c1a45c3b64ea4f8de31
BLAKE2b-256 415fce484edbee0b60fbb642bc7066eae000f892aea3f12bddf2abd0d58c1db9

See more details on using hashes here.

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210430205854-cp36-cp36m-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210430205854-cp36-cp36m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 e03484eb26b650db9f47b5827e5ccf574a3f598b51cacbbf20ebd74f23f53157
MD5 11d7bd63c745d391bddf8dae571c81eb
BLAKE2b-256 1ebd9273874018eafcc3c3ddb5b3ff48aa14b500b8895ed0241ea163e94f3625

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page