Skip to main content

TensorFlow IO

Project description




TensorFlow I/O

GitHub CI PyPI License Documentation

TensorFlow I/O is a collection of file systems and file formats that are not available in TensorFlow's built-in support. A full list of supported file systems and file formats by TensorFlow I/O can be found here.

The use of tensorflow-io is straightforward with keras. Below is an example to Get Started with TensorFlow with the data processing aspect replaced by tensorflow-io:

import tensorflow as tf
import tensorflow_io as tfio

# Read the MNIST data into the IODataset.
dataset_url = "https://storage.googleapis.com/cvdf-datasets/mnist/"
d_train = tfio.IODataset.from_mnist(
    dataset_url + "train-images-idx3-ubyte.gz",
    dataset_url + "train-labels-idx1-ubyte.gz",
)

# Shuffle the elements of the dataset.
d_train = d_train.shuffle(buffer_size=1024)

# By default image data is uint8, so convert to float32 using map().
d_train = d_train.map(lambda x, y: (tf.image.convert_image_dtype(x, tf.float32), y))

# prepare batches the data just like any other tf.data.Dataset
d_train = d_train.batch(32)

# Build the model.
model = tf.keras.models.Sequential(
    [
        tf.keras.layers.Flatten(input_shape=(28, 28)),
        tf.keras.layers.Dense(512, activation=tf.nn.relu),
        tf.keras.layers.Dropout(0.2),
        tf.keras.layers.Dense(10, activation=tf.nn.softmax),
    ]
)

# Compile the model.
model.compile(
    optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"]
)

# Fit the model.
model.fit(d_train, epochs=5, steps_per_epoch=200)

In the above MNIST example, the URL's to access the dataset files are passed directly to the tfio.IODataset.from_mnist API call. This is due to the inherent support that tensorflow-io provides for HTTP/HTTPS file system, thus eliminating the need for downloading and saving datasets on a local directory.

NOTE: Since tensorflow-io is able to detect and uncompress the MNIST dataset automatically if needed, we can pass the URL's for the compressed files (gzip) to the API call as is.

Please check the official documentation for more detailed and interesting usages of the package.

Installation

Python Package

The tensorflow-io Python package can be installed with pip directly using:

$ pip install tensorflow-io

People who are a little more adventurous can also try our nightly binaries:

$ pip install tensorflow-io-nightly

To ensure you have a version of TensorFlow that is compatible with TensorFlow-IO, you can specify the tensorflow extra requirement during install:

pip install tensorflow-io[tensorflow]

Similar extras exist for the tensorflow-gpu, tensorflow-cpu and tensorflow-rocm packages.

Docker Images

In addition to the pip packages, the docker images can be used to quickly get started.

For stable builds:

$ docker pull tfsigio/tfio:latest
$ docker run -it --rm --name tfio-latest tfsigio/tfio:latest

For nightly builds:

$ docker pull tfsigio/tfio:nightly
$ docker run -it --rm --name tfio-nightly tfsigio/tfio:nightly

R Package

Once the tensorflow-io Python package has been successfully installed, you can install the development version of the R package from GitHub via the following:

if (!require("remotes")) install.packages("remotes")
remotes::install_github("tensorflow/io", subdir = "R-package")

TensorFlow Version Compatibility

To ensure compatibility with TensorFlow, it is recommended to install a matching version of TensorFlow I/O according to the table below. You can find the list of releases here.

TensorFlow I/O Version TensorFlow Compatibility Release Date
0.23.1 2.7.x Dec 15, 2021
0.23.0 2.7.x Dec 14, 2021
0.22.0 2.7.x Nov 10, 2021
0.21.0 2.6.x Sep 12, 2021
0.20.0 2.6.x Aug 11, 2021
0.19.1 2.5.x Jul 25, 2021
0.19.0 2.5.x Jun 25, 2021
0.18.0 2.5.x May 13, 2021
0.17.1 2.4.x Apr 16, 2021
0.17.0 2.4.x Dec 14, 2020
0.16.0 2.3.x Oct 23, 2020
0.15.0 2.3.x Aug 03, 2020
0.14.0 2.2.x Jul 08, 2020
0.13.0 2.2.x May 10, 2020
0.12.0 2.1.x Feb 28, 2020
0.11.0 2.1.x Jan 10, 2020
0.10.0 2.0.x Dec 05, 2019
0.9.1 2.0.x Nov 15, 2019
0.9.0 2.0.x Oct 18, 2019
0.8.1 1.15.x Nov 15, 2019
0.8.0 1.15.x Oct 17, 2019
0.7.2 1.14.x Nov 15, 2019
0.7.1 1.14.x Oct 18, 2019
0.7.0 1.14.x Jul 14, 2019
0.6.0 1.13.x May 29, 2019
0.5.0 1.13.x Apr 12, 2019
0.4.0 1.13.x Mar 01, 2019
0.3.0 1.12.0 Feb 15, 2019
0.2.0 1.12.0 Jan 29, 2019
0.1.0 1.12.0 Dec 16, 2018

Performance Benchmarking

We use github-pages to document the results of API performance benchmarks. The benchmark job is triggered on every commit to master branch and facilitates tracking performance w.r.t commits.

Contributing

Tensorflow I/O is a community led open source project. As such, the project depends on public contributions, bug-fixes, and documentation. Please see:

Build Status and CI

Build Status
Linux CPU Python 2 Status
Linux CPU Python 3 Status
Linux GPU Python 2 Status
Linux GPU Python 3 Status

Because of manylinux2010 requirement, TensorFlow I/O is built with Ubuntu:16.04 + Developer Toolset 7 (GCC 7.3) on Linux. Configuration with Ubuntu 16.04 with Developer Toolset 7 is not exactly straightforward. If the system have docker installed, then the following command will automatically build manylinux2010 compatible whl package:

#!/usr/bin/env bash

ls dist/*
for f in dist/*.whl; do
  docker run -i --rm -v $PWD:/v -w /v --net=host quay.io/pypa/manylinux2010_x86_64 bash -x -e /v/tools/build/auditwheel repair --plat manylinux2010_x86_64 $f
done
sudo chown -R $(id -nu):$(id -ng) .
ls wheelhouse/*

It takes some time to build, but once complete, there will be python 3.5, 3.6, 3.7 compatible whl packages available in wheelhouse directory.

On macOS, the same command could be used. However, the script expects python in shell and will only generate a whl package that matches the version of python in shell. If you want to build a whl package for a specific python then you have to alias this version of python to python in shell. See .github/workflows/build.yml Auditwheel step for instructions how to do that.

Note the above command is also the command we use when releasing packages for Linux and macOS.

TensorFlow I/O uses both GitHub Workflows and Google CI (Kokoro) for continuous integration. GitHub Workflows is used for macOS build and test. Kokoro is used for Linux build and test. Again, because of the manylinux2010 requirement, on Linux whl packages are always built with Ubuntu 16.04 + Developer Toolset 7. Tests are done on a variatiy of systems with different python3 versions to ensure a good coverage:

Python Ubuntu 18.04 Ubuntu 20.04 macOS + osx9 Windows-2019
2.7 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: N/A
3.7 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:
3.8 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:

TensorFlow I/O has integrations with many systems and cloud vendors such as Prometheus, Apache Kafka, Apache Ignite, Google Cloud PubSub, AWS Kinesis, Microsoft Azure Storage, Alibaba Cloud OSS etc.

We tried our best to test against those systems in our continuous integration whenever possible. Some tests such as Prometheus, Kafka, and Ignite are done with live systems, meaning we install Prometheus/Kafka/Ignite on CI machine before the test is run. Some tests such as Kinesis, PubSub, and Azure Storage are done through official or non-official emulators. Offline tests are also performed whenever possible, though systems covered through offine tests may not have the same level of coverage as live systems or emulators.

Live System Emulator CI Integration Offline
Apache Kafka :heavy_check_mark: :heavy_check_mark:
Apache Ignite :heavy_check_mark: :heavy_check_mark:
Prometheus :heavy_check_mark: :heavy_check_mark:
Google PubSub :heavy_check_mark: :heavy_check_mark:
Azure Storage :heavy_check_mark: :heavy_check_mark:
AWS Kinesis :heavy_check_mark: :heavy_check_mark:
Alibaba Cloud OSS :heavy_check_mark:
Google BigTable/BigQuery to be added
Elasticsearch (experimental) :heavy_check_mark: :heavy_check_mark:
MongoDB (experimental) :heavy_check_mark: :heavy_check_mark:

References for emulators:

Community

Additional Information

License

Apache License 2.0

Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220127011010-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220127011010-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 341d17e0d28f56d7ad28122878092b2d43d87b7f9c6a31fef6e722abe5f72c77
MD5 d08118854ce0ee349688928d910a3e18
BLAKE2b-256 32689ecbead05f913980b47af3830efa83a4b991262e419355b3821ab458529b

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220127011010-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220127011010-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 cb942a8b1b230c414a6004da91acb639752682fd632faa2cca0075745b3213d3
MD5 3b4a57282b7d8d394c821b9e38ed38e9
BLAKE2b-256 6462336f69ac94f71d40ae481f3f06141fdc8a31cd2a24bc55e365568d41ed7b

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220127011010-cp310-cp310-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220127011010-cp310-cp310-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 d67d353d501fbc72c0c1774ff0f5c3030ab9e08de9ecabd00b8e208abebb20ac
MD5 6f5b98f5699660f5f56dac00c690ab36
BLAKE2b-256 77dc7c1e6f17136b35a2230bcd586f214e595a97f7e2ea568dc40d0830e62113

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220127011010-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220127011010-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 067157ff05be514df7f383f285a913d059c65d2c25ccd1fd63140ad6c3282dad
MD5 8790136e7b4c6dfd737708f4f36c79db
BLAKE2b-256 bf3e70246888ca7182c158423faeea0db962a2544727442ce2d61d2120edcd1a

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220127011010-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220127011010-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 0aae1793c23e7a583abd128992c22c98a12337d2558e7df0c3993a3a11a5f836
MD5 6a507075162ea2c0db650162e1632f98
BLAKE2b-256 5215f2072904c051a375f8ab47727c2986266f303d40c6e10a013ec25ed13b09

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220127011010-cp39-cp39-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220127011010-cp39-cp39-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 222e1fbbcd0a538a508ddf2df168cc7c687f4a7e78775e386f7f1c734d57656b
MD5 0f8e38baffa789c9d08580485298c9de
BLAKE2b-256 a6fa368bfb2143eb027454c004d45fe45de80b2a08b7dd4a2e05e760f43dd289

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220127011010-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220127011010-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 8a90edf134c237a33aa763a1a1150dc26686dd6ff1eabce7e79735ad8d4208f6
MD5 e3d795d2ba878ba8e79afb1a41bb3a67
BLAKE2b-256 6743683c213a2dbea80ccae90f73392ed445c57903a4b30de999cb4382456a26

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220127011010-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220127011010-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 92b24bd3a6a1233cd2786ed6100a40a1de9204383ec73d3e26f90aafd17ad79f
MD5 0dbf961c6fd1834bcd37024716ab6e1f
BLAKE2b-256 e804ca0bb90d66c0a0f225a0d4635507adfdfa346258522ff33126801779e894

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220127011010-cp38-cp38-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220127011010-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 9f80da2e01f407e28bd7c94b3e4f274df313489599c92e1a21126822f7681280
MD5 c78520067c81b97f0bb99d2005aa59d9
BLAKE2b-256 c3799f2235cdcefaa669db8d320b25589b1018be4063faf34686bc7752fc3c62

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220127011010-cp37-cp37m-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220127011010-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 6e0019b2dd5b17ca5d692a1f8e1c60926f3f12ab351ae8e6baf68b2b0af19628
MD5 f03faf6c6db21d8170bdc88438e83042
BLAKE2b-256 f72a2070d3c455a25bcdf1f99855e8e80fc0fc9af6244a5500376abf3bf3c0f9

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220127011010-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220127011010-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 305bbd3b78825f1277d9620ed8cef941afeeb82194d34f67b1eb6a11db0ffbff
MD5 0722b2b87176a196d0c8dbd13b6594db
BLAKE2b-256 c3dce9e039d609bf9968bec12483d4894f6935d9872b9d63999740c12cdab895

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220127011010-cp37-cp37m-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220127011010-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 02911e7e72eb3affd8251bc46b5ebdcb2197195633633f031f9be3c5adc43c04
MD5 68d06cd3a45ba8c2807c8a067f1702af
BLAKE2b-256 746bb121dda56ca216c6f33e8d70ed65f27ea8dd9065f0b8c93579692884cd2f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page