Skip to main content

TensorFlow IO

Project description




TensorFlow I/O

GitHub CI PyPI License Documentation

TensorFlow I/O is a collection of file systems and file formats that are not available in TensorFlow's built-in support. A full list of supported file systems and file formats by TensorFlow I/O can be found here.

The use of tensorflow-io is straightforward with keras. Below is an example to Get Started with TensorFlow with the data processing aspect replaced by tensorflow-io:

import tensorflow as tf
import tensorflow_io as tfio

# Read the MNIST data into the IODataset.
dataset_url = "https://storage.googleapis.com/cvdf-datasets/mnist/"
d_train = tfio.IODataset.from_mnist(
    dataset_url + "train-images-idx3-ubyte.gz",
    dataset_url + "train-labels-idx1-ubyte.gz",
)

# Shuffle the elements of the dataset.
d_train = d_train.shuffle(buffer_size=1024)

# By default image data is uint8, so convert to float32 using map().
d_train = d_train.map(lambda x, y: (tf.image.convert_image_dtype(x, tf.float32), y))

# prepare batches the data just like any other tf.data.Dataset
d_train = d_train.batch(32)

# Build the model.
model = tf.keras.models.Sequential(
    [
        tf.keras.layers.Flatten(input_shape=(28, 28)),
        tf.keras.layers.Dense(512, activation=tf.nn.relu),
        tf.keras.layers.Dropout(0.2),
        tf.keras.layers.Dense(10, activation=tf.nn.softmax),
    ]
)

# Compile the model.
model.compile(
    optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"]
)

# Fit the model.
model.fit(d_train, epochs=5, steps_per_epoch=200)

In the above MNIST example, the URL's to access the dataset files are passed directly to the tfio.IODataset.from_mnist API call. This is due to the inherent support that tensorflow-io provides for HTTP/HTTPS file system, thus eliminating the need for downloading and saving datasets on a local directory.

NOTE: Since tensorflow-io is able to detect and uncompress the MNIST dataset automatically if needed, we can pass the URL's for the compressed files (gzip) to the API call as is.

Please check the official documentation for more detailed and interesting usages of the package.

Installation

Python Package

The tensorflow-io Python package can be installed with pip directly using:

$ pip install tensorflow-io

People who are a little more adventurous can also try our nightly binaries:

$ pip install tensorflow-io-nightly

Docker Images

In addition to the pip packages, the docker images can be used to quickly get started.

For stable builds:

$ docker pull tfsigio/tfio:latest
$ docker run -it --rm --name tfio-latest tfsigio/tfio:latest

For nightly builds:

$ docker pull tfsigio/tfio:nightly
$ docker run -it --rm --name tfio-nightly tfsigio/tfio:nightly

R Package

Once the tensorflow-io Python package has been successfully installed, you can install the development version of the R package from GitHub via the following:

if (!require("remotes")) install.packages("remotes")
remotes::install_github("tensorflow/io", subdir = "R-package")

TensorFlow Version Compatibility

To ensure compatibility with TensorFlow, it is recommended to install a matching version of TensorFlow I/O according to the table below. You can find the list of releases here.

TensorFlow I/O Version TensorFlow Compatibility Release Date
0.18.0 2.5.x May 13, 2021
0.17.1 2.4.x Apr 16, 2021
0.17.0 2.4.x Dec 14, 2020
0.16.0 2.3.x Oct 23, 2020
0.15.0 2.3.x Aug 03, 2020
0.14.0 2.2.x Jul 08, 2020
0.13.0 2.2.x May 10, 2020
0.12.0 2.1.x Feb 28, 2020
0.11.0 2.1.x Jan 10, 2020
0.10.0 2.0.x Dec 05, 2019
0.9.1 2.0.x Nov 15, 2019
0.9.0 2.0.x Oct 18, 2019
0.8.1 1.15.x Nov 15, 2019
0.8.0 1.15.x Oct 17, 2019
0.7.2 1.14.x Nov 15, 2019
0.7.1 1.14.x Oct 18, 2019
0.7.0 1.14.x Jul 14, 2019
0.6.0 1.13.x May 29, 2019
0.5.0 1.13.x Apr 12, 2019
0.4.0 1.13.x Mar 01, 2019
0.3.0 1.12.0 Feb 15, 2019
0.2.0 1.12.0 Jan 29, 2019
0.1.0 1.12.0 Dec 16, 2018

Performance Benchmarking

We use github-pages to document the results of API performance benchmarks. The benchmark job is triggered on every commit to master branch and facilitates tracking performance w.r.t commits.

Contributing

Tensorflow I/O is a community led open source project. As such, the project depends on public contributions, bug-fixes, and documentation. Please see:

Build Status and CI

Build Status
Linux CPU Python 2 Status
Linux CPU Python 3 Status
Linux GPU Python 2 Status
Linux GPU Python 3 Status

Because of manylinux2010 requirement, TensorFlow I/O is built with Ubuntu:16.04 + Developer Toolset 7 (GCC 7.3) on Linux. Configuration with Ubuntu 16.04 with Developer Toolset 7 is not exactly straightforward. If the system have docker installed, then the following command will automatically build manylinux2010 compatible whl package:

#!/usr/bin/env bash

ls dist/*
for f in dist/*.whl; do
  docker run -i --rm -v $PWD:/v -w /v --net=host quay.io/pypa/manylinux2010_x86_64 bash -x -e /v/tools/build/auditwheel repair --plat manylinux2010_x86_64 $f
done
sudo chown -R $(id -nu):$(id -ng) .
ls wheelhouse/*

It takes some time to build, but once complete, there will be python 3.5, 3.6, 3.7 compatible whl packages available in wheelhouse directory.

On macOS, the same command could be used. However, the script expects python in shell and will only generate a whl package that matches the version of python in shell. If you want to build a whl package for a specific python then you have to alias this version of python to python in shell. See .github/workflows/build.yml Auditwheel step for instructions how to do that.

Note the above command is also the command we use when releasing packages for Linux and macOS.

TensorFlow I/O uses both GitHub Workflows and Google CI (Kokoro) for continuous integration. GitHub Workflows is used for macOS build and test. Kokoro is used for Linux build and test. Again, because of the manylinux2010 requirement, on Linux whl packages are always built with Ubuntu 16.04 + Developer Toolset 7. Tests are done on a variatiy of systems with different python3 versions to ensure a good coverage:

Python Ubuntu 18.04 Ubuntu 20.04 macOS + osx9 Windows-2019
2.7 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: N/A
3.7 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:
3.8 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:

TensorFlow I/O has integrations with many systems and cloud vendors such as Prometheus, Apache Kafka, Apache Ignite, Google Cloud PubSub, AWS Kinesis, Microsoft Azure Storage, Alibaba Cloud OSS etc.

We tried our best to test against those systems in our continuous integration whenever possible. Some tests such as Prometheus, Kafka, and Ignite are done with live systems, meaning we install Prometheus/Kafka/Ignite on CI machine before the test is run. Some tests such as Kinesis, PubSub, and Azure Storage are done through official or non-official emulators. Offline tests are also performed whenever possible, though systems covered through offine tests may not have the same level of coverage as live systems or emulators.

Live System Emulator CI Integration Offline
Apache Kafka :heavy_check_mark: :heavy_check_mark:
Apache Ignite :heavy_check_mark: :heavy_check_mark:
Prometheus :heavy_check_mark: :heavy_check_mark:
Google PubSub :heavy_check_mark: :heavy_check_mark:
Azure Storage :heavy_check_mark: :heavy_check_mark:
AWS Kinesis :heavy_check_mark: :heavy_check_mark:
Alibaba Cloud OSS :heavy_check_mark:
Google BigTable/BigQuery to be added
Elasticsearch (experimental) :heavy_check_mark: :heavy_check_mark:
MongoDB (experimental) :heavy_check_mark: :heavy_check_mark:

References for emulators:

Community

Additional Information

License

Apache License 2.0

Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519192421-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519192421-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 7c90e521d01563b90d796b42fad997e386ab5f4be16305f93f68e8b2028ddb2c
MD5 a6c0c1db657da1d8bf3ef64f574745d1
BLAKE2b-256 8ad26ae1a89813732158470956ab74c665490404ad3fdf3408e0b14f52331686

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519192421-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519192421-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 399d1f31371d15153b3f382278d399067fda5abd11ed5702e04ce6583d8b86ca
MD5 68f7340a5d5c18f134a850fc4e7f2e3f
BLAKE2b-256 2887309cc669c0d8dbf53fb3152747aa25c9332cf0f7c8ebe3b2b5e3d0fed0c1

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519192421-cp39-cp39-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519192421-cp39-cp39-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 b286652381c8dfd3669c5750085850bb58ee3ea5fc13ba4625a2d860b5f7ddb6
MD5 2004dbc3f21aa1433a4740386ad6c45d
BLAKE2b-256 4decea56a01bfa8662e05c1ff525f7222ee0625fd6df598545527fb1a9866d57

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519192421-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519192421-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 473885f12dd66893cd75f80d092319f4ffa0697bf5ba50ec93db36521782e079
MD5 c98ff65e61cb38e861be1efa78b0e051
BLAKE2b-256 06f3206040f15cb50af4634e26ef234358f0db6dc4cd2ac94c01194378a7aa7f

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519192421-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519192421-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 38eba24674a961142259bbe08c36d44323c1ea2fa34740cd2c12b58cc3bb8bcd
MD5 a9cefb77e35f01826a6c33a1410347b1
BLAKE2b-256 0246bb3bbf72ac624d07ce3c1becd93ae8f978427022c39e8b5fa9fc05acfcc3

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519192421-cp38-cp38-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519192421-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 81d0104c744591ff2b6ef4641df677a9c0672224c44ea4568fa019107cd03bda
MD5 3ae70e322437fe08d8fcf55fa47c0d75
BLAKE2b-256 18b53853b8b78278b61424f70c53dafb086e92347f2d8cd8da2645c6350465e5

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519192421-cp37-cp37m-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519192421-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 1d8c901358a9aa0a1c98c00af99a078de23c33e8e5f0df15b611be5188bff3a6
MD5 23bab8889fcf1f5d8e4c99e5ad62a507
BLAKE2b-256 96d60466123ad59ffa4d1efb48a53665d63921cea73729903697dc99c0c3c4d0

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519192421-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519192421-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 a4559912a57255a2fd6ac357e925ddec10794a048a759760a451ac79a3748c91
MD5 e6a62fcbf24d76002ada04d1100c3f3e
BLAKE2b-256 6a1ad3095eb3bc6cc977154dcf4f2a2cf44e3135c492c73ce08070a221f2eb7c

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519192421-cp37-cp37m-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519192421-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 280a936222ffb894d950d01b2413292228467a9f22023bf4b66b78944712b4a1
MD5 ecfb58f3cf5d8e0f8b15f917521baf5a
BLAKE2b-256 acaa5075e0ec49fa0f3f3bf22a4d37b19fee0a1b9a8c463d0d1881d2b9fa70a9

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519192421-cp36-cp36m-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519192421-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 a84dd5d9349d234cd90caabe02e3820da5f2d76b531891745db4790f876006db
MD5 6b9b3f01b3f0d58e1700565b0d8f0029
BLAKE2b-256 d49d95318bb12931fdf40c25ad38f010924c9eba27a9df7f4b024e667a06a8c2

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519192421-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519192421-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 968358cea3798e5a7284219aa69f0a2ce5378c5d23e8ad332e5fa008a16c298c
MD5 ef85c360e3eafe40d1609f6bcf927b0e
BLAKE2b-256 49e0f774740beb839007648e25598d0ba0f6d9f90058b21d9ad817bffefe3e4d

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519192421-cp36-cp36m-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519192421-cp36-cp36m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 ce6a6a40b73c153225c980ad765a72f6094988b95a68e78140614057ddc5b926
MD5 d7c2371e593783eebde73559ee9edbec
BLAKE2b-256 3ce9e9746c2397f3368585053a492460100372154158492206b4a10dbb2fc175

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page