Skip to main content

TensorFlow IO

Project description




TensorFlow I/O

GitHub CI PyPI License Documentation

TensorFlow I/O is a collection of file systems and file formats that are not available in TensorFlow's built-in support. A full list of supported file systems and file formats by TensorFlow I/O can be found here.

The use of tensorflow-io is straightforward with keras. Below is an example to Get Started with TensorFlow with the data processing aspect replaced by tensorflow-io:

import tensorflow as tf
import tensorflow_io as tfio

# Read the MNIST data into the IODataset.
dataset_url = "https://storage.googleapis.com/cvdf-datasets/mnist/"
d_train = tfio.IODataset.from_mnist(
    dataset_url + "train-images-idx3-ubyte.gz",
    dataset_url + "train-labels-idx1-ubyte.gz",
)

# Shuffle the elements of the dataset.
d_train = d_train.shuffle(buffer_size=1024)

# By default image data is uint8, so convert to float32 using map().
d_train = d_train.map(lambda x, y: (tf.image.convert_image_dtype(x, tf.float32), y))

# prepare batches the data just like any other tf.data.Dataset
d_train = d_train.batch(32)

# Build the model.
model = tf.keras.models.Sequential(
    [
        tf.keras.layers.Flatten(input_shape=(28, 28)),
        tf.keras.layers.Dense(512, activation=tf.nn.relu),
        tf.keras.layers.Dropout(0.2),
        tf.keras.layers.Dense(10, activation=tf.nn.softmax),
    ]
)

# Compile the model.
model.compile(
    optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"]
)

# Fit the model.
model.fit(d_train, epochs=5, steps_per_epoch=200)

In the above MNIST example, the URL's to access the dataset files are passed directly to the tfio.IODataset.from_mnist API call. This is due to the inherent support that tensorflow-io provides for HTTP/HTTPS file system, thus eliminating the need for downloading and saving datasets on a local directory.

NOTE: Since tensorflow-io is able to detect and uncompress the MNIST dataset automatically if needed, we can pass the URL's for the compressed files (gzip) to the API call as is.

Please check the official documentation for more detailed and interesting usages of the package.

Installation

Python Package

The tensorflow-io Python package can be installed with pip directly using:

$ pip install tensorflow-io

People who are a little more adventurous can also try our nightly binaries:

$ pip install tensorflow-io-nightly

Docker Images

In addition to the pip packages, the docker images can be used to quickly get started.

For stable builds:

$ docker pull tfsigio/tfio:latest
$ docker run -it --rm --name tfio-latest tfsigio/tfio:latest

For nightly builds:

$ docker pull tfsigio/tfio:nightly
$ docker run -it --rm --name tfio-nightly tfsigio/tfio:nightly

R Package

Once the tensorflow-io Python package has been successfully installed, you can install the development version of the R package from GitHub via the following:

if (!require("remotes")) install.packages("remotes")
remotes::install_github("tensorflow/io", subdir = "R-package")

TensorFlow Version Compatibility

To ensure compatibility with TensorFlow, it is recommended to install a matching version of TensorFlow I/O according to the table below. You can find the list of releases here.

TensorFlow I/O Version TensorFlow Compatibility Release Date
0.19.1 2.5.x Jul 25, 2021
0.19.0 2.5.x Jun 25, 2021
0.18.0 2.5.x May 13, 2021
0.17.1 2.4.x Apr 16, 2021
0.17.0 2.4.x Dec 14, 2020
0.16.0 2.3.x Oct 23, 2020
0.15.0 2.3.x Aug 03, 2020
0.14.0 2.2.x Jul 08, 2020
0.13.0 2.2.x May 10, 2020
0.12.0 2.1.x Feb 28, 2020
0.11.0 2.1.x Jan 10, 2020
0.10.0 2.0.x Dec 05, 2019
0.9.1 2.0.x Nov 15, 2019
0.9.0 2.0.x Oct 18, 2019
0.8.1 1.15.x Nov 15, 2019
0.8.0 1.15.x Oct 17, 2019
0.7.2 1.14.x Nov 15, 2019
0.7.1 1.14.x Oct 18, 2019
0.7.0 1.14.x Jul 14, 2019
0.6.0 1.13.x May 29, 2019
0.5.0 1.13.x Apr 12, 2019
0.4.0 1.13.x Mar 01, 2019
0.3.0 1.12.0 Feb 15, 2019
0.2.0 1.12.0 Jan 29, 2019
0.1.0 1.12.0 Dec 16, 2018

Performance Benchmarking

We use github-pages to document the results of API performance benchmarks. The benchmark job is triggered on every commit to master branch and facilitates tracking performance w.r.t commits.

Contributing

Tensorflow I/O is a community led open source project. As such, the project depends on public contributions, bug-fixes, and documentation. Please see:

Build Status and CI

Build Status
Linux CPU Python 2 Status
Linux CPU Python 3 Status
Linux GPU Python 2 Status
Linux GPU Python 3 Status

Because of manylinux2010 requirement, TensorFlow I/O is built with Ubuntu:16.04 + Developer Toolset 7 (GCC 7.3) on Linux. Configuration with Ubuntu 16.04 with Developer Toolset 7 is not exactly straightforward. If the system have docker installed, then the following command will automatically build manylinux2010 compatible whl package:

#!/usr/bin/env bash

ls dist/*
for f in dist/*.whl; do
  docker run -i --rm -v $PWD:/v -w /v --net=host quay.io/pypa/manylinux2010_x86_64 bash -x -e /v/tools/build/auditwheel repair --plat manylinux2010_x86_64 $f
done
sudo chown -R $(id -nu):$(id -ng) .
ls wheelhouse/*

It takes some time to build, but once complete, there will be python 3.5, 3.6, 3.7 compatible whl packages available in wheelhouse directory.

On macOS, the same command could be used. However, the script expects python in shell and will only generate a whl package that matches the version of python in shell. If you want to build a whl package for a specific python then you have to alias this version of python to python in shell. See .github/workflows/build.yml Auditwheel step for instructions how to do that.

Note the above command is also the command we use when releasing packages for Linux and macOS.

TensorFlow I/O uses both GitHub Workflows and Google CI (Kokoro) for continuous integration. GitHub Workflows is used for macOS build and test. Kokoro is used for Linux build and test. Again, because of the manylinux2010 requirement, on Linux whl packages are always built with Ubuntu 16.04 + Developer Toolset 7. Tests are done on a variatiy of systems with different python3 versions to ensure a good coverage:

Python Ubuntu 18.04 Ubuntu 20.04 macOS + osx9 Windows-2019
2.7 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: N/A
3.7 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:
3.8 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:

TensorFlow I/O has integrations with many systems and cloud vendors such as Prometheus, Apache Kafka, Apache Ignite, Google Cloud PubSub, AWS Kinesis, Microsoft Azure Storage, Alibaba Cloud OSS etc.

We tried our best to test against those systems in our continuous integration whenever possible. Some tests such as Prometheus, Kafka, and Ignite are done with live systems, meaning we install Prometheus/Kafka/Ignite on CI machine before the test is run. Some tests such as Kinesis, PubSub, and Azure Storage are done through official or non-official emulators. Offline tests are also performed whenever possible, though systems covered through offine tests may not have the same level of coverage as live systems or emulators.

Live System Emulator CI Integration Offline
Apache Kafka :heavy_check_mark: :heavy_check_mark:
Apache Ignite :heavy_check_mark: :heavy_check_mark:
Prometheus :heavy_check_mark: :heavy_check_mark:
Google PubSub :heavy_check_mark: :heavy_check_mark:
Azure Storage :heavy_check_mark: :heavy_check_mark:
AWS Kinesis :heavy_check_mark: :heavy_check_mark:
Alibaba Cloud OSS :heavy_check_mark:
Google BigTable/BigQuery to be added
Elasticsearch (experimental) :heavy_check_mark: :heavy_check_mark:
MongoDB (experimental) :heavy_check_mark: :heavy_check_mark:

References for emulators:

Community

Additional Information

License

Apache License 2.0

Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210730101436-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210730101436-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 a3ff1d698d77e8b096a543b1aabcbadb54234cfc18852ada714e3e55395beb89
MD5 2b02dc51ee8d7d424088195d1058ae65
BLAKE2b-256 2f8eb44087e03b48f51671046b286604876fcf996e4f06165d190715de8cceb2

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210730101436-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210730101436-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 5eb72a3907f55f800eb3295bbf4fc3c85157b16285e79f8c9f4d63a89cf0a624
MD5 d2243edce5a0ebfbdea2f4e249db80f8
BLAKE2b-256 ad50356056f8e8ebd936fc6678cde5df2c6578a93edc226c894a515bac7947d4

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210730101436-cp39-cp39-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210730101436-cp39-cp39-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 259254e95c56faf0acc8a1794a984fd3597b72b214a5c5b9c11078cd76577ee4
MD5 d04d2c09b2e8f55f1638c6e9101a6ebe
BLAKE2b-256 9f156030f101fdeb0ea1516eaa0ab3a45674930d6d5c2104b381025d8d1dc88b

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210730101436-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210730101436-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 cb229013bb344ec8aec6f50532287a68017054be5892e05e1e6b936a40db2ee3
MD5 5af42be3c72d1f2dd5874349f3caccde
BLAKE2b-256 a16d4b62ad17c3dbc5fe102c4a0270c7f1f3b0b40505a2b167ba539428e61cff

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210730101436-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210730101436-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 f03203de404768d3b8798eb22a35a1072bc7f07c533458b5cd33820aac47caca
MD5 a35d3b2530aafd6bc1d2ba7d8afaa67b
BLAKE2b-256 d04b50affe84e43ce87ca95c7fe35c9ffe4ff410324d8f9ba90092c3acbf2a10

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210730101436-cp38-cp38-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210730101436-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 593c55e36aa36ed1dbb24d827cb2877418d7615f0d65832bdb46e03ac608cc3a
MD5 4a33aa2d99a28d08304d06bf3fbee887
BLAKE2b-256 49feb93003e94b8bc078e91b6132ec0a447adc8a141486fef46b65b87d1a6c72

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210730101436-cp37-cp37m-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210730101436-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 279abb0565e12445949a1dc99e0251d1069f11b3c7504699822b46212b3a9226
MD5 35f20d0d8e10bf74fd2837547589899f
BLAKE2b-256 1d057980cae790b3b255995cef68e4c497877b93a1d15885321a97af3b284b25

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210730101436-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210730101436-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 44d10148684f6eae56c982b6d1f5716c45a0586d28f9f006236e2c7463188435
MD5 5211798e4fb0ae36a89b872bbee09c2d
BLAKE2b-256 dcebadae2c73c70a580b0946b0fae8c55ca929d84cad4a613990b93565f2719e

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210730101436-cp37-cp37m-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210730101436-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 7e6144b7e70e39f91c6b72440337e4011a93427889d082949c9f2916a7427e9b
MD5 c7d035117fd66f7ee14cf0e491b765e5
BLAKE2b-256 83a6311c0aca455727047c5005da7f37b7ff39377d0726f117733f5bb3f46366

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210730101436-cp36-cp36m-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210730101436-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 88c964ebe61eeae078ad5b92122f342294596ce2710f958a820850bd497b08fe
MD5 7f401efad8c8d4ebfe2b083773372197
BLAKE2b-256 09d6941d1f789f09a0437bd3610efc4a8e744e8202894098e2e5fc13ce78ab43

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210730101436-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210730101436-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 df8759c94347cc96b580b1cd6fbde639374b54be7531af9cf83cf4d214d6b399
MD5 08914cc23ebaf3c84e46318de252cc76
BLAKE2b-256 cecf46e1c3d6546d314d93fd302bbedb4670e75228b35639d90205005b46c13d

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210730101436-cp36-cp36m-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210730101436-cp36-cp36m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 c677b3520f4816f00962cf9bf2d07a6f16ae8f7cd42e4c95c825aa4e671dbb1e
MD5 7310ffc0ae9c743ae62d572ec842a12e
BLAKE2b-256 c1c860052e38c3e5672a39c19bc4b35c662119f7dc8d68b5c312db2bb418e446

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page