Skip to main content

TensorFlow IO

Project description




TensorFlow I/O

GitHub CI PyPI License Documentation

TensorFlow I/O is a collection of file systems and file formats that are not available in TensorFlow's built-in support. A full list of supported file systems and file formats by TensorFlow I/O can be found here.

The use of tensorflow-io is straightforward with keras. Below is an example to Get Started with TensorFlow with the data processing aspect replaced by tensorflow-io:

import tensorflow as tf
import tensorflow_io as tfio

# Read the MNIST data into the IODataset.
dataset_url = "https://storage.googleapis.com/cvdf-datasets/mnist/"
d_train = tfio.IODataset.from_mnist(
    dataset_url + "train-images-idx3-ubyte.gz",
    dataset_url + "train-labels-idx1-ubyte.gz",
)

# Shuffle the elements of the dataset.
d_train = d_train.shuffle(buffer_size=1024)

# By default image data is uint8, so convert to float32 using map().
d_train = d_train.map(lambda x, y: (tf.image.convert_image_dtype(x, tf.float32), y))

# prepare batches the data just like any other tf.data.Dataset
d_train = d_train.batch(32)

# Build the model.
model = tf.keras.models.Sequential(
    [
        tf.keras.layers.Flatten(input_shape=(28, 28)),
        tf.keras.layers.Dense(512, activation=tf.nn.relu),
        tf.keras.layers.Dropout(0.2),
        tf.keras.layers.Dense(10, activation=tf.nn.softmax),
    ]
)

# Compile the model.
model.compile(
    optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"]
)

# Fit the model.
model.fit(d_train, epochs=5, steps_per_epoch=200)

In the above MNIST example, the URL's to access the dataset files are passed directly to the tfio.IODataset.from_mnist API call. This is due to the inherent support that tensorflow-io provides for HTTP/HTTPS file system, thus eliminating the need for downloading and saving datasets on a local directory.

NOTE: Since tensorflow-io is able to detect and uncompress the MNIST dataset automatically if needed, we can pass the URL's for the compressed files (gzip) to the API call as is.

Please check the official documentation for more detailed and interesting usages of the package.

Installation

Python Package

The tensorflow-io Python package can be installed with pip directly using:

$ pip install tensorflow-io

People who are a little more adventurous can also try our nightly binaries:

$ pip install tensorflow-io-nightly

Docker Images

In addition to the pip packages, the docker images can be used to quickly get started.

For stable builds:

$ docker pull tfsigio/tfio:latest
$ docker run -it --rm --name tfio-latest tfsigio/tfio:latest

For nightly builds:

$ docker pull tfsigio/tfio:nightly
$ docker run -it --rm --name tfio-nightly tfsigio/tfio:nightly

R Package

Once the tensorflow-io Python package has been successfully installed, you can install the development version of the R package from GitHub via the following:

if (!require("remotes")) install.packages("remotes")
remotes::install_github("tensorflow/io", subdir = "R-package")

TensorFlow Version Compatibility

To ensure compatibility with TensorFlow, it is recommended to install a matching version of TensorFlow I/O according to the table below. You can find the list of releases here.

TensorFlow I/O Version TensorFlow Compatibility Release Date
0.19.0 2.5.x Jun 25, 2021
0.18.0 2.5.x May 13, 2021
0.17.1 2.4.x Apr 16, 2021
0.17.0 2.4.x Dec 14, 2020
0.16.0 2.3.x Oct 23, 2020
0.15.0 2.3.x Aug 03, 2020
0.14.0 2.2.x Jul 08, 2020
0.13.0 2.2.x May 10, 2020
0.12.0 2.1.x Feb 28, 2020
0.11.0 2.1.x Jan 10, 2020
0.10.0 2.0.x Dec 05, 2019
0.9.1 2.0.x Nov 15, 2019
0.9.0 2.0.x Oct 18, 2019
0.8.1 1.15.x Nov 15, 2019
0.8.0 1.15.x Oct 17, 2019
0.7.2 1.14.x Nov 15, 2019
0.7.1 1.14.x Oct 18, 2019
0.7.0 1.14.x Jul 14, 2019
0.6.0 1.13.x May 29, 2019
0.5.0 1.13.x Apr 12, 2019
0.4.0 1.13.x Mar 01, 2019
0.3.0 1.12.0 Feb 15, 2019
0.2.0 1.12.0 Jan 29, 2019
0.1.0 1.12.0 Dec 16, 2018

Performance Benchmarking

We use github-pages to document the results of API performance benchmarks. The benchmark job is triggered on every commit to master branch and facilitates tracking performance w.r.t commits.

Contributing

Tensorflow I/O is a community led open source project. As such, the project depends on public contributions, bug-fixes, and documentation. Please see:

Build Status and CI

Build Status
Linux CPU Python 2 Status
Linux CPU Python 3 Status
Linux GPU Python 2 Status
Linux GPU Python 3 Status

Because of manylinux2010 requirement, TensorFlow I/O is built with Ubuntu:16.04 + Developer Toolset 7 (GCC 7.3) on Linux. Configuration with Ubuntu 16.04 with Developer Toolset 7 is not exactly straightforward. If the system have docker installed, then the following command will automatically build manylinux2010 compatible whl package:

#!/usr/bin/env bash

ls dist/*
for f in dist/*.whl; do
  docker run -i --rm -v $PWD:/v -w /v --net=host quay.io/pypa/manylinux2010_x86_64 bash -x -e /v/tools/build/auditwheel repair --plat manylinux2010_x86_64 $f
done
sudo chown -R $(id -nu):$(id -ng) .
ls wheelhouse/*

It takes some time to build, but once complete, there will be python 3.5, 3.6, 3.7 compatible whl packages available in wheelhouse directory.

On macOS, the same command could be used. However, the script expects python in shell and will only generate a whl package that matches the version of python in shell. If you want to build a whl package for a specific python then you have to alias this version of python to python in shell. See .github/workflows/build.yml Auditwheel step for instructions how to do that.

Note the above command is also the command we use when releasing packages for Linux and macOS.

TensorFlow I/O uses both GitHub Workflows and Google CI (Kokoro) for continuous integration. GitHub Workflows is used for macOS build and test. Kokoro is used for Linux build and test. Again, because of the manylinux2010 requirement, on Linux whl packages are always built with Ubuntu 16.04 + Developer Toolset 7. Tests are done on a variatiy of systems with different python3 versions to ensure a good coverage:

Python Ubuntu 18.04 Ubuntu 20.04 macOS + osx9 Windows-2019
2.7 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: N/A
3.7 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:
3.8 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:

TensorFlow I/O has integrations with many systems and cloud vendors such as Prometheus, Apache Kafka, Apache Ignite, Google Cloud PubSub, AWS Kinesis, Microsoft Azure Storage, Alibaba Cloud OSS etc.

We tried our best to test against those systems in our continuous integration whenever possible. Some tests such as Prometheus, Kafka, and Ignite are done with live systems, meaning we install Prometheus/Kafka/Ignite on CI machine before the test is run. Some tests such as Kinesis, PubSub, and Azure Storage are done through official or non-official emulators. Offline tests are also performed whenever possible, though systems covered through offine tests may not have the same level of coverage as live systems or emulators.

Live System Emulator CI Integration Offline
Apache Kafka :heavy_check_mark: :heavy_check_mark:
Apache Ignite :heavy_check_mark: :heavy_check_mark:
Prometheus :heavy_check_mark: :heavy_check_mark:
Google PubSub :heavy_check_mark: :heavy_check_mark:
Azure Storage :heavy_check_mark: :heavy_check_mark:
AWS Kinesis :heavy_check_mark: :heavy_check_mark:
Alibaba Cloud OSS :heavy_check_mark:
Google BigTable/BigQuery to be added
Elasticsearch (experimental) :heavy_check_mark: :heavy_check_mark:
MongoDB (experimental) :heavy_check_mark: :heavy_check_mark:

References for emulators:

Community

Additional Information

License

Apache License 2.0

Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210714135018-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210714135018-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 75d6a7858e580550409d6a1bc6e8e357358e3eaba587d748559e15a680ff66a8
MD5 cbcf79e89f383ccbeccd54a0351a74ff
BLAKE2b-256 64c9cf637c21492272917ed736fc9a3082c40bd231cabd38f4b9ff7525b3c06e

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210714135018-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210714135018-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 57b1f96c593293a940bb03f52fc2a1152867c4f2bb122914bdc80a45256c5b4f
MD5 f9e1b939a4d9780d07c7a677da682810
BLAKE2b-256 347f25d9929e140c5c947b2a67c9c031ed7235c450321efd62d6853cc5e06c9b

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210714135018-cp39-cp39-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210714135018-cp39-cp39-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 aaec00e9af65ec85406d2f124d5efb2cbb9cba70608699901c1f8d88c7a00e06
MD5 9ff4067529e13ca17a0fda737d9deacb
BLAKE2b-256 23bfdee91d30a56cf2ebabf2c2962421b41d5572ba50bef66be837b7a57834ad

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210714135018-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210714135018-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 a0c33018aa62c67c0879cec04b284f146d415f4c376eb6e33ec3764d353965c5
MD5 0c68b8e0e0d0c7ba6daae0db93f9b1c2
BLAKE2b-256 dcd5b8f0e9eed1c9e988401929892879d3b540ce2b30d8f7fb6440d5a6b997d7

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210714135018-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210714135018-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 331413350594a6ef188e2504ff51d70c07bea9a08d564659348de84cff758c7f
MD5 2a9f8280aad35fd868e87a115c89aa9e
BLAKE2b-256 6b5d98c7cd72014afa608be39edb663995fe1b8d8274c4df2cffce21ab35c96f

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210714135018-cp38-cp38-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210714135018-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 2a6033cafa0c998d6c923531e180b37f30e23660cdc438fb0310bdbcbc7267da
MD5 5a84b2d61b65738c5527eb559810fb70
BLAKE2b-256 fb925e9262969415cdab74557441993528c669fe4fce5d839eabb74da602b8c9

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210714135018-cp37-cp37m-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210714135018-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 d0f791cd7bbc206ed07ad9fdb7295bdba8394d05c3ba94891ac82e080837058b
MD5 ae2c5d1da6333f2fff3df3f3a1b06ae8
BLAKE2b-256 89bb4c66f99e07b3eac0cf323f6d2e04d1a34f43523b835917735566a5a819b4

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210714135018-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210714135018-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 ddce704db48b8afa876feb6d768825bd5d5a54a916fcadd7a6304a8892abc7d6
MD5 bc8111dfb7648f65fd3d4035d6930287
BLAKE2b-256 9d086c0389b2e0efe39cb0b41be8d465700a80289310b822635c72ef40a930a7

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210714135018-cp37-cp37m-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210714135018-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 dddfca84869c6a29ad4620bd76c2dbdd8d53379812901da6f59c935abcf4b0fe
MD5 c6a1c77a031b3fe5b423a43c59098068
BLAKE2b-256 5f9eb803efec8330772d799ff8de85417cedb9cf157724d15442bddd23bab663

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210714135018-cp36-cp36m-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210714135018-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 0ad466ca5b171c1aae39d060df52e1ce4657d8f6b834de187f6f9f7deb5a857f
MD5 28a366a6942d9056d1ed1c6db5294e6e
BLAKE2b-256 dd2381bc83959dd95272e07d1fe8faecf73d77e6a5c9df32e736854fb44f4f1c

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210714135018-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210714135018-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 35026820e271a14d6310a8150701f0add741705ffcf5e06227c28d5091ddc08f
MD5 e783d5a4457b9a0a89b191e2702a063b
BLAKE2b-256 19a1c51eca72937c14a4d977251743e4d426e7b9a74a1523dfd227522282bd7a

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210714135018-cp36-cp36m-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210714135018-cp36-cp36m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 231dc03d977c87114605fd79a1ff6537df6fe05bc91238ccd09c636e15b09654
MD5 c6f9fdda9d067bbb4d58552732a7b9ec
BLAKE2b-256 bf646069172240187ad7b4af8a9c7b6555d081dda8036e65a4288d97be7f9537

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page