Skip to main content

TensorFlow IO

Project description




TensorFlow I/O

GitHub CI PyPI License Documentation

TensorFlow I/O is a collection of file systems and file formats that are not available in TensorFlow's built-in support. A full list of supported file systems and file formats by TensorFlow I/O can be found here.

The use of tensorflow-io is straightforward with keras. Below is an example to Get Started with TensorFlow with the data processing aspect replaced by tensorflow-io:

import tensorflow as tf
import tensorflow_io as tfio

# Read the MNIST data into the IODataset.
dataset_url = "https://storage.googleapis.com/cvdf-datasets/mnist/"
d_train = tfio.IODataset.from_mnist(
    dataset_url + "train-images-idx3-ubyte.gz",
    dataset_url + "train-labels-idx1-ubyte.gz",
)

# Shuffle the elements of the dataset.
d_train = d_train.shuffle(buffer_size=1024)

# By default image data is uint8, so convert to float32 using map().
d_train = d_train.map(lambda x, y: (tf.image.convert_image_dtype(x, tf.float32), y))

# prepare batches the data just like any other tf.data.Dataset
d_train = d_train.batch(32)

# Build the model.
model = tf.keras.models.Sequential(
    [
        tf.keras.layers.Flatten(input_shape=(28, 28)),
        tf.keras.layers.Dense(512, activation=tf.nn.relu),
        tf.keras.layers.Dropout(0.2),
        tf.keras.layers.Dense(10, activation=tf.nn.softmax),
    ]
)

# Compile the model.
model.compile(
    optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"]
)

# Fit the model.
model.fit(d_train, epochs=5, steps_per_epoch=200)

In the above MNIST example, the URL's to access the dataset files are passed directly to the tfio.IODataset.from_mnist API call. This is due to the inherent support that tensorflow-io provides for HTTP/HTTPS file system, thus eliminating the need for downloading and saving datasets on a local directory.

NOTE: Since tensorflow-io is able to detect and uncompress the MNIST dataset automatically if needed, we can pass the URL's for the compressed files (gzip) to the API call as is.

Please check the official documentation for more detailed and interesting usages of the package.

Installation

Python Package

The tensorflow-io Python package can be installed with pip directly using:

$ pip install tensorflow-io

People who are a little more adventurous can also try our nightly binaries:

$ pip install tensorflow-io-nightly

Docker Images

In addition to the pip packages, the docker images can be used to quickly get started.

For stable builds:

$ docker pull tfsigio/tfio:latest
$ docker run -it --rm --name tfio-latest tfsigio/tfio:latest

For nightly builds:

$ docker pull tfsigio/tfio:nightly
$ docker run -it --rm --name tfio-nightly tfsigio/tfio:nightly

R Package

Once the tensorflow-io Python package has been successfully installed, you can install the development version of the R package from GitHub via the following:

if (!require("remotes")) install.packages("remotes")
remotes::install_github("tensorflow/io", subdir = "R-package")

TensorFlow Version Compatibility

To ensure compatibility with TensorFlow, it is recommended to install a matching version of TensorFlow I/O according to the table below. You can find the list of releases here.

TensorFlow I/O Version TensorFlow Compatibility Release Date
0.17.1 2.4.x Apr 16, 2021
0.17.0 2.4.x Dec 14, 2020
0.16.0 2.3.x Oct 23, 2020
0.15.0 2.3.x Aug 03, 2020
0.14.0 2.2.x Jul 08, 2020
0.13.0 2.2.x May 10, 2020
0.12.0 2.1.x Feb 28, 2020
0.11.0 2.1.x Jan 10, 2020
0.10.0 2.0.x Dec 05, 2019
0.9.1 2.0.x Nov 15, 2019
0.9.0 2.0.x Oct 18, 2019
0.8.1 1.15.x Nov 15, 2019
0.8.0 1.15.x Oct 17, 2019
0.7.2 1.14.x Nov 15, 2019
0.7.1 1.14.x Oct 18, 2019
0.7.0 1.14.x Jul 14, 2019
0.6.0 1.13.x May 29, 2019
0.5.0 1.13.x Apr 12, 2019
0.4.0 1.13.x Mar 01, 2019
0.3.0 1.12.0 Feb 15, 2019
0.2.0 1.12.0 Jan 29, 2019
0.1.0 1.12.0 Dec 16, 2018

Performance Benchmarking

We use github-pages to document the results of API performance benchmarks. The benchmark job is triggered on every commit to master branch and facilitates tracking performance w.r.t commits.

Contributing

Tensorflow I/O is a community led open source project. As such, the project depends on public contributions, bug-fixes, and documentation. Please see:

Build Status and CI

Build Status
Linux CPU Python 2 Status
Linux CPU Python 3 Status
Linux GPU Python 2 Status
Linux GPU Python 3 Status

Because of manylinux2010 requirement, TensorFlow I/O is built with Ubuntu:16.04 + Developer Toolset 7 (GCC 7.3) on Linux. Configuration with Ubuntu 16.04 with Developer Toolset 7 is not exactly straightforward. If the system have docker installed, then the following command will automatically build manylinux2010 compatible whl package:

#!/usr/bin/env bash

ls dist/*
for f in dist/*.whl; do
  docker run -i --rm -v $PWD:/v -w /v --net=host quay.io/pypa/manylinux2010_x86_64 bash -x -e /v/tools/build/auditwheel repair --plat manylinux2010_x86_64 $f
done
sudo chown -R $(id -nu):$(id -ng) .
ls wheelhouse/*

It takes some time to build, but once complete, there will be python 3.5, 3.6, 3.7 compatible whl packages available in wheelhouse directory.

On macOS, the same command could be used. However, the script expects python in shell and will only generate a whl package that matches the version of python in shell. If you want to build a whl package for a specific python then you have to alias this version of python to python in shell. See .github/workflows/build.yml Auditwheel step for instructions how to do that.

Note the above command is also the command we use when releasing packages for Linux and macOS.

TensorFlow I/O uses both GitHub Workflows and Google CI (Kokoro) for continuous integration. GitHub Workflows is used for macOS build and test. Kokoro is used for Linux build and test. Again, because of the manylinux2010 requirement, on Linux whl packages are always built with Ubuntu 16.04 + Developer Toolset 7. Tests are done on a variatiy of systems with different python3 versions to ensure a good coverage:

Python Ubuntu 18.04 Ubuntu 20.04 macOS + osx9 Windows-2019
2.7 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: N/A
3.7 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:
3.8 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:

TensorFlow I/O has integrations with many systems and cloud vendors such as Prometheus, Apache Kafka, Apache Ignite, Google Cloud PubSub, AWS Kinesis, Microsoft Azure Storage, Alibaba Cloud OSS etc.

We tried our best to test against those systems in our continuous integration whenever possible. Some tests such as Prometheus, Kafka, and Ignite are done with live systems, meaning we install Prometheus/Kafka/Ignite on CI machine before the test is run. Some tests such as Kinesis, PubSub, and Azure Storage are done through official or non-official emulators. Offline tests are also performed whenever possible, though systems covered through offine tests may not have the same level of coverage as live systems or emulators.

Live System Emulator CI Integration Offline
Apache Kafka :heavy_check_mark: :heavy_check_mark:
Apache Ignite :heavy_check_mark: :heavy_check_mark:
Prometheus :heavy_check_mark: :heavy_check_mark:
Google PubSub :heavy_check_mark: :heavy_check_mark:
Azure Storage :heavy_check_mark: :heavy_check_mark:
AWS Kinesis :heavy_check_mark: :heavy_check_mark:
Alibaba Cloud OSS :heavy_check_mark:
Google BigTable/BigQuery to be added
Elasticsearch (experimental) :heavy_check_mark: :heavy_check_mark:
MongoDB (experimental) :heavy_check_mark: :heavy_check_mark:

References for emulators:

Community

Additional Information

License

Apache License 2.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505010904-cp39-cp39-manylinux2010_x86_64.whl (2.5 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.12+ x86-64

tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505010904-cp38-cp38-manylinux2010_x86_64.whl (2.5 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.12+ x86-64

tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505010904-cp37-cp37m-manylinux2010_x86_64.whl (2.5 MB view details)

Uploaded CPython 3.7m manylinux: glibc 2.12+ x86-64

tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505010904-cp36-cp36m-manylinux2010_x86_64.whl (2.5 MB view details)

Uploaded CPython 3.6m manylinux: glibc 2.12+ x86-64

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505010904-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505010904-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 7f48fe08eb2d3d56900bd71a6c17a679f734e0b470822fc5c9a363fa3b867a8e
MD5 11bd05c6ef75d328922c6a3b38cf18ff
BLAKE2b-256 b365a05441703187f4619cf619c1f87601dd00feddf8e3ea174d4007a29ea49c

See more details on using hashes here.

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505010904-cp39-cp39-manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505010904-cp39-cp39-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 9bd3ddec0f327c1369d3841383afc505e2a78e9f6712af3770fe4b467acd2b70
MD5 5eb22f91ebae31c21c0f1bab7d4ff073
BLAKE2b-256 e6e2b7ccbfd3efaf0a151b55ac75ad042bdf59a4ddcd1756ce7c7dc2e43d0e46

See more details on using hashes here.

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505010904-cp39-cp39-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505010904-cp39-cp39-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 c0f409c3798434408cbcbf740f365c84417d58242246c418fa59b687d5c8cfcd
MD5 0663fabf3015d0f2131b0a640bca4af7
BLAKE2b-256 fd8a05072d7005181190339cde12ca58ade997688932270b150c828370c7fd73

See more details on using hashes here.

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505010904-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505010904-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 d6d3f8ecbd3640a9a59943bc5e73fb1cd1204573833d5ae289f52c0fd715aff9
MD5 6015baf5bc06e14c4fb6b4accabb80e2
BLAKE2b-256 aaa24397bc464f8decd2bdcc3792d32e770b15741a8b0a75e122923f96b025d0

See more details on using hashes here.

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505010904-cp38-cp38-manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505010904-cp38-cp38-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 1270e8a91dde5846ba7c8fa0270052fef02f354c6fec6a406be9ea744d1e5368
MD5 36af3d6e3553d495931258a698ec0249
BLAKE2b-256 04336a513ecbcb58542b38ed074e42120b77c15d4c8e7d2866fab4bfcca809e3

See more details on using hashes here.

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505010904-cp38-cp38-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505010904-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 6ab74608771f9c000e03885339a2d8a36cf8acde3e2194b3bfcd0532f189a9c6
MD5 0dceeaf5e8efe03e1baa0daa97baeeea
BLAKE2b-256 d0991ee5eeae16cf2588bfc642cbf25013bd4ca44945c84c1f94ef6bf7a6f13c

See more details on using hashes here.

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505010904-cp37-cp37m-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505010904-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 834db35df0a17ffb60d4ed1e1444a5ce0a88e57364feb6981c6350d2860a2a31
MD5 5c6f184e5d330fca45dbb0aa26de4b45
BLAKE2b-256 eaf2d584bdeabfdad8d9c97d3c3a3b96b3056f2efc76f5da799569341e8b2342

See more details on using hashes here.

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505010904-cp37-cp37m-manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505010904-cp37-cp37m-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 f34777596423c4ddceb5c2f562ca20a0ed364dd273b6b00292dfa2e5378dfd5a
MD5 eac97351e22f4d2aba80974e90049e45
BLAKE2b-256 99cbee46a2b4686b2952ff25cbf5879e4751c87d7c4bba1b6db8a5d386e3d3ea

See more details on using hashes here.

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505010904-cp37-cp37m-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505010904-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 e199216a9b76875f594245ecbeacce912d48b2d46db76f0aa832973387caaa8b
MD5 b0be3eaafeae8a297fc1b6cccebd803b
BLAKE2b-256 1ad31e93c659a9a4111ec5410ea4c0c4a956618f99c38038381f7e357199f0f2

See more details on using hashes here.

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505010904-cp36-cp36m-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505010904-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 da1c9c10a05b281883c9a5df79c3731ea7f86e75b6447b400bba723c4fc4a7ea
MD5 a57c45ae455dc1cdd3f12ec8b533ceac
BLAKE2b-256 80a79376425446f059ef9a215a0e6200e2b731cd7e61f35b7669792156cf5470

See more details on using hashes here.

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505010904-cp36-cp36m-manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505010904-cp36-cp36m-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 fc196f87053d67c39735e9c2767a239beb7387c468bfeb327d1e8d5d98bf56c3
MD5 7e34a1cb9eb62ea649ed910bd019ec7c
BLAKE2b-256 dc571f4cd933bf46aa622052cbec66fa631ff81e5d3ae8aab24fbc3c63b797c3

See more details on using hashes here.

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505010904-cp36-cp36m-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505010904-cp36-cp36m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 a8705cee8faa7c297e6e40dac9cf9007b87f3ce0a8f55a958478c8b4b61c7a1f
MD5 88fe5ccb1b2a9491a23c33cbec46b4b6
BLAKE2b-256 110c5ac07fbe5a986fc3dc4a9d407980d60e9b973ef569af233285e146fb7ac9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page