Skip to main content

TensorFlow IO

Project description




TensorFlow I/O

GitHub CI PyPI License Documentation

TensorFlow I/O is a collection of file systems and file formats that are not available in TensorFlow's built-in support. A full list of supported file systems and file formats by TensorFlow I/O can be found here.

The use of tensorflow-io is straightforward with keras. Below is an example to Get Started with TensorFlow with the data processing aspect replaced by tensorflow-io:

import tensorflow as tf
import tensorflow_io as tfio

# Read the MNIST data into the IODataset.
dataset_url = "https://storage.googleapis.com/cvdf-datasets/mnist/"
d_train = tfio.IODataset.from_mnist(
    dataset_url + "train-images-idx3-ubyte.gz",
    dataset_url + "train-labels-idx1-ubyte.gz",
)

# Shuffle the elements of the dataset.
d_train = d_train.shuffle(buffer_size=1024)

# By default image data is uint8, so convert to float32 using map().
d_train = d_train.map(lambda x, y: (tf.image.convert_image_dtype(x, tf.float32), y))

# prepare batches the data just like any other tf.data.Dataset
d_train = d_train.batch(32)

# Build the model.
model = tf.keras.models.Sequential(
    [
        tf.keras.layers.Flatten(input_shape=(28, 28)),
        tf.keras.layers.Dense(512, activation=tf.nn.relu),
        tf.keras.layers.Dropout(0.2),
        tf.keras.layers.Dense(10, activation=tf.nn.softmax),
    ]
)

# Compile the model.
model.compile(
    optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"]
)

# Fit the model.
model.fit(d_train, epochs=5, steps_per_epoch=200)

In the above MNIST example, the URL's to access the dataset files are passed directly to the tfio.IODataset.from_mnist API call. This is due to the inherent support that tensorflow-io provides for HTTP/HTTPS file system, thus eliminating the need for downloading and saving datasets on a local directory.

NOTE: Since tensorflow-io is able to detect and uncompress the MNIST dataset automatically if needed, we can pass the URL's for the compressed files (gzip) to the API call as is.

Please check the official documentation for more detailed and interesting usages of the package.

Installation

Python Package

The tensorflow-io Python package can be installed with pip directly using:

$ pip install tensorflow-io

People who are a little more adventurous can also try our nightly binaries:

$ pip install tensorflow-io-nightly

Docker Images

In addition to the pip packages, the docker images can be used to quickly get started.

For stable builds:

$ docker pull tfsigio/tfio:latest
$ docker run -it --rm --name tfio-latest tfsigio/tfio:latest

For nightly builds:

$ docker pull tfsigio/tfio:nightly
$ docker run -it --rm --name tfio-nightly tfsigio/tfio:nightly

R Package

Once the tensorflow-io Python package has been successfully installed, you can install the development version of the R package from GitHub via the following:

if (!require("remotes")) install.packages("remotes")
remotes::install_github("tensorflow/io", subdir = "R-package")

TensorFlow Version Compatibility

To ensure compatibility with TensorFlow, it is recommended to install a matching version of TensorFlow I/O according to the table below. You can find the list of releases here.

TensorFlow I/O Version TensorFlow Compatibility Release Date
0.17.1 2.4.x Apr 16, 2021
0.17.0 2.4.x Dec 14, 2020
0.16.0 2.3.x Oct 23, 2020
0.15.0 2.3.x Aug 03, 2020
0.14.0 2.2.x Jul 08, 2020
0.13.0 2.2.x May 10, 2020
0.12.0 2.1.x Feb 28, 2020
0.11.0 2.1.x Jan 10, 2020
0.10.0 2.0.x Dec 05, 2019
0.9.1 2.0.x Nov 15, 2019
0.9.0 2.0.x Oct 18, 2019
0.8.1 1.15.x Nov 15, 2019
0.8.0 1.15.x Oct 17, 2019
0.7.2 1.14.x Nov 15, 2019
0.7.1 1.14.x Oct 18, 2019
0.7.0 1.14.x Jul 14, 2019
0.6.0 1.13.x May 29, 2019
0.5.0 1.13.x Apr 12, 2019
0.4.0 1.13.x Mar 01, 2019
0.3.0 1.12.0 Feb 15, 2019
0.2.0 1.12.0 Jan 29, 2019
0.1.0 1.12.0 Dec 16, 2018

Performance Benchmarking

We use github-pages to document the results of API performance benchmarks. The benchmark job is triggered on every commit to master branch and facilitates tracking performance w.r.t commits.

Contributing

Tensorflow I/O is a community led open source project. As such, the project depends on public contributions, bug-fixes, and documentation. Please see:

Build Status and CI

Build Status
Linux CPU Python 2 Status
Linux CPU Python 3 Status
Linux GPU Python 2 Status
Linux GPU Python 3 Status

Because of manylinux2010 requirement, TensorFlow I/O is built with Ubuntu:16.04 + Developer Toolset 7 (GCC 7.3) on Linux. Configuration with Ubuntu 16.04 with Developer Toolset 7 is not exactly straightforward. If the system have docker installed, then the following command will automatically build manylinux2010 compatible whl package:

#!/usr/bin/env bash

ls dist/*
for f in dist/*.whl; do
  docker run -i --rm -v $PWD:/v -w /v --net=host quay.io/pypa/manylinux2010_x86_64 bash -x -e /v/tools/build/auditwheel repair --plat manylinux2010_x86_64 $f
done
sudo chown -R $(id -nu):$(id -ng) .
ls wheelhouse/*

It takes some time to build, but once complete, there will be python 3.5, 3.6, 3.7 compatible whl packages available in wheelhouse directory.

On macOS, the same command could be used. However, the script expects python in shell and will only generate a whl package that matches the version of python in shell. If you want to build a whl package for a specific python then you have to alias this version of python to python in shell. See .github/workflows/build.yml Auditwheel step for instructions how to do that.

Note the above command is also the command we use when releasing packages for Linux and macOS.

TensorFlow I/O uses both GitHub Workflows and Google CI (Kokoro) for continuous integration. GitHub Workflows is used for macOS build and test. Kokoro is used for Linux build and test. Again, because of the manylinux2010 requirement, on Linux whl packages are always built with Ubuntu 16.04 + Developer Toolset 7. Tests are done on a variatiy of systems with different python3 versions to ensure a good coverage:

Python Ubuntu 18.04 Ubuntu 20.04 macOS + osx9 Windows-2019
2.7 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: N/A
3.7 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:
3.8 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:

TensorFlow I/O has integrations with many systems and cloud vendors such as Prometheus, Apache Kafka, Apache Ignite, Google Cloud PubSub, AWS Kinesis, Microsoft Azure Storage, Alibaba Cloud OSS etc.

We tried our best to test against those systems in our continuous integration whenever possible. Some tests such as Prometheus, Kafka, and Ignite are done with live systems, meaning we install Prometheus/Kafka/Ignite on CI machine before the test is run. Some tests such as Kinesis, PubSub, and Azure Storage are done through official or non-official emulators. Offline tests are also performed whenever possible, though systems covered through offine tests may not have the same level of coverage as live systems or emulators.

Live System Emulator CI Integration Offline
Apache Kafka :heavy_check_mark: :heavy_check_mark:
Apache Ignite :heavy_check_mark: :heavy_check_mark:
Prometheus :heavy_check_mark: :heavy_check_mark:
Google PubSub :heavy_check_mark: :heavy_check_mark:
Azure Storage :heavy_check_mark: :heavy_check_mark:
AWS Kinesis :heavy_check_mark: :heavy_check_mark:
Alibaba Cloud OSS :heavy_check_mark:
Google BigTable/BigQuery to be added
Elasticsearch (experimental) :heavy_check_mark: :heavy_check_mark:
MongoDB (experimental) :heavy_check_mark: :heavy_check_mark:

References for emulators:

Community

Additional Information

License

Apache License 2.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505034832-cp39-cp39-manylinux2010_x86_64.whl (2.5 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.12+ x86-64

tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505034832-cp38-cp38-manylinux2010_x86_64.whl (2.5 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.12+ x86-64

tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505034832-cp37-cp37m-manylinux2010_x86_64.whl (2.5 MB view details)

Uploaded CPython 3.7m manylinux: glibc 2.12+ x86-64

tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505034832-cp36-cp36m-manylinux2010_x86_64.whl (2.5 MB view details)

Uploaded CPython 3.6m manylinux: glibc 2.12+ x86-64

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505034832-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505034832-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 6368e9a7dae544c6dd62cd04b7d33254b30ada620cbaa1e2add5f633fcfb3978
MD5 ec88a0797d212210ac85beb20fff70f3
BLAKE2b-256 e30146eb8fb3a865a42d5dc2eabb7edbd2cb712365a4bca3ede330307d3b755d

See more details on using hashes here.

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505034832-cp39-cp39-manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505034832-cp39-cp39-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 837be14b4df84818757507f7505ff8800e9b3cfd15afb4585e58aab934c8c006
MD5 88c05e776ff159a742a14ec86750b408
BLAKE2b-256 2ada44171eaf6365a3983a8d4112d95fa129d07e964038a5937f4d4cee74b511

See more details on using hashes here.

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505034832-cp39-cp39-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505034832-cp39-cp39-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 601f802a13721a24c2f566f4982da2e2f5113d36f12bbe3cf3e104085aa944fd
MD5 295cd38464ce7e11a21af6255f1a0a8e
BLAKE2b-256 f9238b7d93e739dae979b4ed704d217ea677a20d70eb62cb836357c575ed013c

See more details on using hashes here.

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505034832-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505034832-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 72b00d51b479a6e72ecec56b740cac85f9ef6a7e5eecae0aa10f2dc1d0e11212
MD5 a961d4753a261d80195d7e5ade992239
BLAKE2b-256 6126037cdeadd9a049bb961967bca137dd7aab44d3d8a1580cd0bf1ba894f3db

See more details on using hashes here.

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505034832-cp38-cp38-manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505034832-cp38-cp38-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 5dd1bb1829a68de2f8aa62f71aec8e070ed4c5c944c8c0a49a8d451dfb640f25
MD5 982a5e77bdf17ce99f57c64e56bda123
BLAKE2b-256 cdaad779313eb0b875e98d34e5e2c1f43932ba7de9f5fa32789d15cce4ea74eb

See more details on using hashes here.

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505034832-cp38-cp38-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505034832-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 d0e97cf7502115b85e588461ab0336884b6c914d55ac91309953f4f2177c3715
MD5 941d12ead8b8e3971f138a852bb8586c
BLAKE2b-256 79a6d1985bdbcb2a203d2b64ea4480a6c1d6048c15bf12e45368a1329f7eaec3

See more details on using hashes here.

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505034832-cp37-cp37m-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505034832-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 8c4d91041fb9604f61cff9b7698fcf6b6fdc2439863a4c9014663d9f17d82862
MD5 63041a541a8d1faf68f6156f76d86360
BLAKE2b-256 b4a2ab0f4467eaa88edbc9dc1ebd092185c8e1fca52d438bbe3f51c616c20121

See more details on using hashes here.

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505034832-cp37-cp37m-manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505034832-cp37-cp37m-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 fda4f0bf316b61dc5da143322fa47d005e8718e85dcab6caf232bfce6cf55861
MD5 b29e0169247399e8b5fd6b6c0ef490b3
BLAKE2b-256 4ab77af0f6c64a675edd1e45e9d9c8d1304b0592c0b1ac8ca8cfce7fd392e90c

See more details on using hashes here.

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505034832-cp37-cp37m-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505034832-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 64f98b25e3efbbf3ad71ff29e4c540991c73621b1b5ee3ff0df21eb142b7982b
MD5 b2ed0e1a4fdc5c6df82c2c427513872a
BLAKE2b-256 4eefd07c6ff7b9d4a00573c0a54f1ba144f3367a9cd9e51f5606880e0fdd3bf9

See more details on using hashes here.

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505034832-cp36-cp36m-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505034832-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 c6e3238aac8e861e5cf590a60086d2a14a020fb94399aea33051e3b410432ec2
MD5 2c17ee90a3edd972399ebf50e1bfbc5a
BLAKE2b-256 f1372b05324615630b36793bcadad73a34615764a30bf8fedd09e64641d582a1

See more details on using hashes here.

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505034832-cp36-cp36m-manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505034832-cp36-cp36m-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 56bab4488522eb343b79c312cf50b16f5b88eef3173d55ed1cf48927430401ef
MD5 96e7abb2ecba6ed98881b3b08e41ccec
BLAKE2b-256 31840e7d37a5c0e148d2b7e26070734e708f59a5a0a3e73d10901aa64c1a3137

See more details on using hashes here.

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505034832-cp36-cp36m-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505034832-cp36-cp36m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 f238de26343c52ddb8439430157238c781121dad0d8894a8b5b6d6d8f91f15c8
MD5 cc01a70777a62fbc9e730b7462db3c55
BLAKE2b-256 b1214197c96bda7cbc367eb5fd28a5e6d890cd7fd698502d6dcac60099ad712d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page