Skip to main content

TensorFlow IO

Project description




TensorFlow I/O

GitHub CI PyPI License Documentation

TensorFlow I/O is a collection of file systems and file formats that are not available in TensorFlow's built-in support. A full list of supported file systems and file formats by TensorFlow I/O can be found here.

The use of tensorflow-io is straightforward with keras. Below is an example to Get Started with TensorFlow with the data processing aspect replaced by tensorflow-io:

import tensorflow as tf
import tensorflow_io as tfio

# Read the MNIST data into the IODataset.
dataset_url = "https://storage.googleapis.com/cvdf-datasets/mnist/"
d_train = tfio.IODataset.from_mnist(
    dataset_url + "train-images-idx3-ubyte.gz",
    dataset_url + "train-labels-idx1-ubyte.gz",
)

# Shuffle the elements of the dataset.
d_train = d_train.shuffle(buffer_size=1024)

# By default image data is uint8, so convert to float32 using map().
d_train = d_train.map(lambda x, y: (tf.image.convert_image_dtype(x, tf.float32), y))

# prepare batches the data just like any other tf.data.Dataset
d_train = d_train.batch(32)

# Build the model.
model = tf.keras.models.Sequential(
    [
        tf.keras.layers.Flatten(input_shape=(28, 28)),
        tf.keras.layers.Dense(512, activation=tf.nn.relu),
        tf.keras.layers.Dropout(0.2),
        tf.keras.layers.Dense(10, activation=tf.nn.softmax),
    ]
)

# Compile the model.
model.compile(
    optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"]
)

# Fit the model.
model.fit(d_train, epochs=5, steps_per_epoch=200)

In the above MNIST example, the URL's to access the dataset files are passed directly to the tfio.IODataset.from_mnist API call. This is due to the inherent support that tensorflow-io provides for HTTP/HTTPS file system, thus eliminating the need for downloading and saving datasets on a local directory.

NOTE: Since tensorflow-io is able to detect and uncompress the MNIST dataset automatically if needed, we can pass the URL's for the compressed files (gzip) to the API call as is.

Please check the official documentation for more detailed and interesting usages of the package.

Installation

Python Package

The tensorflow-io Python package can be installed with pip directly using:

$ pip install tensorflow-io

People who are a little more adventurous can also try our nightly binaries:

$ pip install tensorflow-io-nightly

Docker Images

In addition to the pip packages, the docker images can be used to quickly get started.

For stable builds:

$ docker pull tfsigio/tfio:latest
$ docker run -it --rm --name tfio-latest tfsigio/tfio:latest

For nightly builds:

$ docker pull tfsigio/tfio:nightly
$ docker run -it --rm --name tfio-nightly tfsigio/tfio:nightly

R Package

Once the tensorflow-io Python package has been successfully installed, you can install the development version of the R package from GitHub via the following:

if (!require("remotes")) install.packages("remotes")
remotes::install_github("tensorflow/io", subdir = "R-package")

TensorFlow Version Compatibility

To ensure compatibility with TensorFlow, it is recommended to install a matching version of TensorFlow I/O according to the table below. You can find the list of releases here.

TensorFlow I/O Version TensorFlow Compatibility Release Date
0.17.1 2.4.x Apr 16, 2021
0.17.0 2.4.x Dec 14, 2020
0.16.0 2.3.x Oct 23, 2020
0.15.0 2.3.x Aug 03, 2020
0.14.0 2.2.x Jul 08, 2020
0.13.0 2.2.x May 10, 2020
0.12.0 2.1.x Feb 28, 2020
0.11.0 2.1.x Jan 10, 2020
0.10.0 2.0.x Dec 05, 2019
0.9.1 2.0.x Nov 15, 2019
0.9.0 2.0.x Oct 18, 2019
0.8.1 1.15.x Nov 15, 2019
0.8.0 1.15.x Oct 17, 2019
0.7.2 1.14.x Nov 15, 2019
0.7.1 1.14.x Oct 18, 2019
0.7.0 1.14.x Jul 14, 2019
0.6.0 1.13.x May 29, 2019
0.5.0 1.13.x Apr 12, 2019
0.4.0 1.13.x Mar 01, 2019
0.3.0 1.12.0 Feb 15, 2019
0.2.0 1.12.0 Jan 29, 2019
0.1.0 1.12.0 Dec 16, 2018

Performance Benchmarking

We use github-pages to document the results of API performance benchmarks. The benchmark job is triggered on every commit to master branch and facilitates tracking performance w.r.t commits.

Contributing

Tensorflow I/O is a community led open source project. As such, the project depends on public contributions, bug-fixes, and documentation. Please see:

Build Status and CI

Build Status
Linux CPU Python 2 Status
Linux CPU Python 3 Status
Linux GPU Python 2 Status
Linux GPU Python 3 Status

Because of manylinux2010 requirement, TensorFlow I/O is built with Ubuntu:16.04 + Developer Toolset 7 (GCC 7.3) on Linux. Configuration with Ubuntu 16.04 with Developer Toolset 7 is not exactly straightforward. If the system have docker installed, then the following command will automatically build manylinux2010 compatible whl package:

#!/usr/bin/env bash

ls dist/*
for f in dist/*.whl; do
  docker run -i --rm -v $PWD:/v -w /v --net=host quay.io/pypa/manylinux2010_x86_64 bash -x -e /v/tools/build/auditwheel repair --plat manylinux2010_x86_64 $f
done
sudo chown -R $(id -nu):$(id -ng) .
ls wheelhouse/*

It takes some time to build, but once complete, there will be python 3.5, 3.6, 3.7 compatible whl packages available in wheelhouse directory.

On macOS, the same command could be used. However, the script expects python in shell and will only generate a whl package that matches the version of python in shell. If you want to build a whl package for a specific python then you have to alias this version of python to python in shell. See .github/workflows/build.yml Auditwheel step for instructions how to do that.

Note the above command is also the command we use when releasing packages for Linux and macOS.

TensorFlow I/O uses both GitHub Workflows and Google CI (Kokoro) for continuous integration. GitHub Workflows is used for macOS build and test. Kokoro is used for Linux build and test. Again, because of the manylinux2010 requirement, on Linux whl packages are always built with Ubuntu 16.04 + Developer Toolset 7. Tests are done on a variatiy of systems with different python3 versions to ensure a good coverage:

Python Ubuntu 18.04 Ubuntu 20.04 macOS + osx9 Windows-2019
2.7 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: N/A
3.7 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:
3.8 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:

TensorFlow I/O has integrations with many systems and cloud vendors such as Prometheus, Apache Kafka, Apache Ignite, Google Cloud PubSub, AWS Kinesis, Microsoft Azure Storage, Alibaba Cloud OSS etc.

We tried our best to test against those systems in our continuous integration whenever possible. Some tests such as Prometheus, Kafka, and Ignite are done with live systems, meaning we install Prometheus/Kafka/Ignite on CI machine before the test is run. Some tests such as Kinesis, PubSub, and Azure Storage are done through official or non-official emulators. Offline tests are also performed whenever possible, though systems covered through offine tests may not have the same level of coverage as live systems or emulators.

Live System Emulator CI Integration Offline
Apache Kafka :heavy_check_mark: :heavy_check_mark:
Apache Ignite :heavy_check_mark: :heavy_check_mark:
Prometheus :heavy_check_mark: :heavy_check_mark:
Google PubSub :heavy_check_mark: :heavy_check_mark:
Azure Storage :heavy_check_mark: :heavy_check_mark:
AWS Kinesis :heavy_check_mark: :heavy_check_mark:
Alibaba Cloud OSS :heavy_check_mark:
Google BigTable/BigQuery to be added
Elasticsearch (experimental) :heavy_check_mark: :heavy_check_mark:
MongoDB (experimental) :heavy_check_mark: :heavy_check_mark:

References for emulators:

Community

Additional Information

License

Apache License 2.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

tensorflow_io_plugin_gs_nightly-0.18.0.dev20210504011133-cp39-cp39-manylinux2010_x86_64.whl (2.5 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.12+ x86-64

tensorflow_io_plugin_gs_nightly-0.18.0.dev20210504011133-cp38-cp38-manylinux2010_x86_64.whl (2.5 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.12+ x86-64

tensorflow_io_plugin_gs_nightly-0.18.0.dev20210504011133-cp37-cp37m-manylinux2010_x86_64.whl (2.5 MB view details)

Uploaded CPython 3.7m manylinux: glibc 2.12+ x86-64

tensorflow_io_plugin_gs_nightly-0.18.0.dev20210504011133-cp36-cp36m-manylinux2010_x86_64.whl (2.5 MB view details)

Uploaded CPython 3.6m manylinux: glibc 2.12+ x86-64

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210504011133-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210504011133-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 24071efaaeb947a277f11e7ef0e460500bb409d51a715e536db4e368d4470936
MD5 c14fdb7e6237f2e2eb235470f98dde1d
BLAKE2b-256 b703d6adffaf0251b464d21b851f67eb7691afc092cfc745cdc8500d029aa74f

See more details on using hashes here.

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210504011133-cp39-cp39-manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210504011133-cp39-cp39-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 333f05a402ff13b5201e6784777b812f43529fc662a0ad44cddc3ac310625c72
MD5 f63fd1189c5e4b3dd64466a5a2ac8265
BLAKE2b-256 933b6992274feac70e273473ac8b1f3988699c2e2ec222a37447663b35c2b6b2

See more details on using hashes here.

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210504011133-cp39-cp39-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210504011133-cp39-cp39-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 c34c5e8428de0886ea4a5fe4c75d6ad2305fb44baef87cee40c2ce0b3d442388
MD5 de72ee59e775ef930e02017e6764f654
BLAKE2b-256 edb40262a95774bcc3e268041e452b62bdfd444df8108d46a2abba082a1d3dd8

See more details on using hashes here.

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210504011133-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210504011133-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 98a15cd70364578c0abbc9f4c9f2c7b59a74bf1cdd28cb84c74cde6244fd54f6
MD5 b71d995fdf62461c13a080700414f96d
BLAKE2b-256 8d3ccc784712d25b0e156cddc82c8bd4197a94649bdd1c4ecbe7f714f29725ac

See more details on using hashes here.

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210504011133-cp38-cp38-manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210504011133-cp38-cp38-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 aabcdcbce17fabde3358ce9b848986691d862d5e0d57cf77eb421a13e48db354
MD5 9f478dafd478577fdeecd58b704a9dde
BLAKE2b-256 9eb0160865ad54308cff8d251aa53d7b692a159bb485b82e5addd2295400954e

See more details on using hashes here.

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210504011133-cp38-cp38-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210504011133-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 dd8e0b586de29a07ccecdc2e816c97ed49d546efe88361dfa631d511e2aa5860
MD5 d6341d7c0f1fc5b8dbe3d39b7d604392
BLAKE2b-256 d5b962534fee1aefceaa718f375b09da1a5ac3b5ac52c3fd013d1e27e21e58f9

See more details on using hashes here.

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210504011133-cp37-cp37m-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210504011133-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 8d0a03fac3e58d66eafb93905d5ed42dec590cd7fcc5daa5e30840b887c81979
MD5 ed39366c31029d68c8242c3775ff3c7a
BLAKE2b-256 0706202e52ceaba9fad7896915ffc0e44e8e516b4fd8874d17bb1ba92520a63c

See more details on using hashes here.

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210504011133-cp37-cp37m-manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210504011133-cp37-cp37m-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 98d263de978d45b262200594acf519e0e30508b93193bc9a629179c600fb3b33
MD5 1fcbbf80ea651522333e11b8a274b990
BLAKE2b-256 e8f3d0f61a9ab81cc2497b83954b759ce22e5ebb8f2aea898bbe2220e9a6097f

See more details on using hashes here.

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210504011133-cp37-cp37m-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210504011133-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 8e9de1ff07e4a799c76361754018db13ca692c8c510060a83afbc4676eaa91bf
MD5 a57f3457386aee40960b8ec7bec48f5e
BLAKE2b-256 8246c4558bb9e3a4b8f1d85c530c817abfaf394f2863f7e9157177e0ef03e8a9

See more details on using hashes here.

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210504011133-cp36-cp36m-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210504011133-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 1244ac5334a336299f72a6f29522fd241a0f8cf16c34926198e3eef74778ca1e
MD5 eff91be56c8040d4b5604b04b1a480b3
BLAKE2b-256 9c32c1a7cc895a0495455242b0c63916327fa53a8c0d4f5ad259afc2b6f1c4f7

See more details on using hashes here.

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210504011133-cp36-cp36m-manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210504011133-cp36-cp36m-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 5bfaa530b592e9573f80038d5586a2620cd6a951ee27f7506604ee82ea740d61
MD5 bf3d129a0474e800cbf67a844b99d1b2
BLAKE2b-256 7761dc814d42cb7fbc609f6e225d7e493754acacb8dbbd3a2622b05b7f376bc1

See more details on using hashes here.

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210504011133-cp36-cp36m-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210504011133-cp36-cp36m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 a78538798402833070690408406a13c6d385ffc87169d35477d8a305c3703a96
MD5 2bdfc33e23fdc0d8f8c24e77360af7e1
BLAKE2b-256 79569225206cb359b87d90ee7be9b1b71338f3a7ae84a3b676602494a9e3b2ef

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page