Skip to main content

TensorFlow IO

Project description




TensorFlow I/O

GitHub CI PyPI License Documentation

TensorFlow I/O is a collection of file systems and file formats that are not available in TensorFlow's built-in support. A full list of supported file systems and file formats by TensorFlow I/O can be found here.

The use of tensorflow-io is straightforward with keras. Below is an example to Get Started with TensorFlow with the data processing aspect replaced by tensorflow-io:

import tensorflow as tf
import tensorflow_io as tfio

# Read the MNIST data into the IODataset.
dataset_url = "https://storage.googleapis.com/cvdf-datasets/mnist/"
d_train = tfio.IODataset.from_mnist(
    dataset_url + "train-images-idx3-ubyte.gz",
    dataset_url + "train-labels-idx1-ubyte.gz",
)

# Shuffle the elements of the dataset.
d_train = d_train.shuffle(buffer_size=1024)

# By default image data is uint8, so convert to float32 using map().
d_train = d_train.map(lambda x, y: (tf.image.convert_image_dtype(x, tf.float32), y))

# prepare batches the data just like any other tf.data.Dataset
d_train = d_train.batch(32)

# Build the model.
model = tf.keras.models.Sequential(
    [
        tf.keras.layers.Flatten(input_shape=(28, 28)),
        tf.keras.layers.Dense(512, activation=tf.nn.relu),
        tf.keras.layers.Dropout(0.2),
        tf.keras.layers.Dense(10, activation=tf.nn.softmax),
    ]
)

# Compile the model.
model.compile(
    optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"]
)

# Fit the model.
model.fit(d_train, epochs=5, steps_per_epoch=200)

In the above MNIST example, the URL's to access the dataset files are passed directly to the tfio.IODataset.from_mnist API call. This is due to the inherent support that tensorflow-io provides for HTTP/HTTPS file system, thus eliminating the need for downloading and saving datasets on a local directory.

NOTE: Since tensorflow-io is able to detect and uncompress the MNIST dataset automatically if needed, we can pass the URL's for the compressed files (gzip) to the API call as is.

Please check the official documentation for more detailed and interesting usages of the package.

Installation

Python Package

The tensorflow-io Python package can be installed with pip directly using:

$ pip install tensorflow-io

People who are a little more adventurous can also try our nightly binaries:

$ pip install tensorflow-io-nightly

Docker Images

In addition to the pip packages, the docker images can be used to quickly get started.

For stable builds:

$ docker pull tfsigio/tfio:latest
$ docker run -it --rm --name tfio-latest tfsigio/tfio:latest

For nightly builds:

$ docker pull tfsigio/tfio:nightly
$ docker run -it --rm --name tfio-nightly tfsigio/tfio:nightly

R Package

Once the tensorflow-io Python package has been successfully installed, you can install the development version of the R package from GitHub via the following:

if (!require("remotes")) install.packages("remotes")
remotes::install_github("tensorflow/io", subdir = "R-package")

TensorFlow Version Compatibility

To ensure compatibility with TensorFlow, it is recommended to install a matching version of TensorFlow I/O according to the table below. You can find the list of releases here.

TensorFlow I/O Version TensorFlow Compatibility Release Date
0.20.0 2.6.x Aug 11, 2021
0.19.1 2.5.x Jul 25, 2021
0.19.0 2.5.x Jun 25, 2021
0.18.0 2.5.x May 13, 2021
0.17.1 2.4.x Apr 16, 2021
0.17.0 2.4.x Dec 14, 2020
0.16.0 2.3.x Oct 23, 2020
0.15.0 2.3.x Aug 03, 2020
0.14.0 2.2.x Jul 08, 2020
0.13.0 2.2.x May 10, 2020
0.12.0 2.1.x Feb 28, 2020
0.11.0 2.1.x Jan 10, 2020
0.10.0 2.0.x Dec 05, 2019
0.9.1 2.0.x Nov 15, 2019
0.9.0 2.0.x Oct 18, 2019
0.8.1 1.15.x Nov 15, 2019
0.8.0 1.15.x Oct 17, 2019
0.7.2 1.14.x Nov 15, 2019
0.7.1 1.14.x Oct 18, 2019
0.7.0 1.14.x Jul 14, 2019
0.6.0 1.13.x May 29, 2019
0.5.0 1.13.x Apr 12, 2019
0.4.0 1.13.x Mar 01, 2019
0.3.0 1.12.0 Feb 15, 2019
0.2.0 1.12.0 Jan 29, 2019
0.1.0 1.12.0 Dec 16, 2018

Performance Benchmarking

We use github-pages to document the results of API performance benchmarks. The benchmark job is triggered on every commit to master branch and facilitates tracking performance w.r.t commits.

Contributing

Tensorflow I/O is a community led open source project. As such, the project depends on public contributions, bug-fixes, and documentation. Please see:

Build Status and CI

Build Status
Linux CPU Python 2 Status
Linux CPU Python 3 Status
Linux GPU Python 2 Status
Linux GPU Python 3 Status

Because of manylinux2010 requirement, TensorFlow I/O is built with Ubuntu:16.04 + Developer Toolset 7 (GCC 7.3) on Linux. Configuration with Ubuntu 16.04 with Developer Toolset 7 is not exactly straightforward. If the system have docker installed, then the following command will automatically build manylinux2010 compatible whl package:

#!/usr/bin/env bash

ls dist/*
for f in dist/*.whl; do
  docker run -i --rm -v $PWD:/v -w /v --net=host quay.io/pypa/manylinux2010_x86_64 bash -x -e /v/tools/build/auditwheel repair --plat manylinux2010_x86_64 $f
done
sudo chown -R $(id -nu):$(id -ng) .
ls wheelhouse/*

It takes some time to build, but once complete, there will be python 3.5, 3.6, 3.7 compatible whl packages available in wheelhouse directory.

On macOS, the same command could be used. However, the script expects python in shell and will only generate a whl package that matches the version of python in shell. If you want to build a whl package for a specific python then you have to alias this version of python to python in shell. See .github/workflows/build.yml Auditwheel step for instructions how to do that.

Note the above command is also the command we use when releasing packages for Linux and macOS.

TensorFlow I/O uses both GitHub Workflows and Google CI (Kokoro) for continuous integration. GitHub Workflows is used for macOS build and test. Kokoro is used for Linux build and test. Again, because of the manylinux2010 requirement, on Linux whl packages are always built with Ubuntu 16.04 + Developer Toolset 7. Tests are done on a variatiy of systems with different python3 versions to ensure a good coverage:

Python Ubuntu 18.04 Ubuntu 20.04 macOS + osx9 Windows-2019
2.7 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: N/A
3.7 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:
3.8 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:

TensorFlow I/O has integrations with many systems and cloud vendors such as Prometheus, Apache Kafka, Apache Ignite, Google Cloud PubSub, AWS Kinesis, Microsoft Azure Storage, Alibaba Cloud OSS etc.

We tried our best to test against those systems in our continuous integration whenever possible. Some tests such as Prometheus, Kafka, and Ignite are done with live systems, meaning we install Prometheus/Kafka/Ignite on CI machine before the test is run. Some tests such as Kinesis, PubSub, and Azure Storage are done through official or non-official emulators. Offline tests are also performed whenever possible, though systems covered through offine tests may not have the same level of coverage as live systems or emulators.

Live System Emulator CI Integration Offline
Apache Kafka :heavy_check_mark: :heavy_check_mark:
Apache Ignite :heavy_check_mark: :heavy_check_mark:
Prometheus :heavy_check_mark: :heavy_check_mark:
Google PubSub :heavy_check_mark: :heavy_check_mark:
Azure Storage :heavy_check_mark: :heavy_check_mark:
AWS Kinesis :heavy_check_mark: :heavy_check_mark:
Alibaba Cloud OSS :heavy_check_mark:
Google BigTable/BigQuery to be added
Elasticsearch (experimental) :heavy_check_mark: :heavy_check_mark:
MongoDB (experimental) :heavy_check_mark: :heavy_check_mark:

References for emulators:

Community

Additional Information

License

Apache License 2.0

Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210815170710-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210815170710-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 59d69ef02b34179f082da6b6b880de268b816ceffd909b0dad8318156f621913
MD5 87424ab4a9eb2f07e83a1f89b131df16
BLAKE2b-256 46780f0a75934b058ae4180646abf4e3ed4558d573f9dec714a167e687e22838

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210815170710-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210815170710-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 0073e3d0ab57513aae93fc4bb3ffbad1d7b6b7f473e2ad10e51442938449d157
MD5 9ca26fdd28abc76ed56031c5c4a0de8e
BLAKE2b-256 9ce23b7278181555f4862fb91bc6c4af1ba80a5d3e2b12044887e22df2c0038f

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210815170710-cp39-cp39-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210815170710-cp39-cp39-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 3f0f3d6662ca89a35df2f79f9f9f6dd013ac598323972a58b8b4aa8c4e1dbc99
MD5 f0d3e54825fd6ecbc553c0dd4d38997b
BLAKE2b-256 49de887869bc00df5145f60434df96dd22a31e08f85f5242ac9999ac1706cf49

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210815170710-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210815170710-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 fa4ebe6bdb8ae05a743a955b9d5c87ef2cd9378b7a3d09c2f4c1c8ff108eb22c
MD5 6931c2ccce23726611013ef44ac9405b
BLAKE2b-256 b9866d8e4fbe9dd0cd8d5f818f0f15932d5010b5cc2098785dc9d6059fd714e4

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210815170710-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210815170710-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 208c3c648724cee905d55ecdb6990c35fcf505f2753c9dbcfc4f71e3ef11c3c9
MD5 67fc534b686b59b45dca2f8e57754299
BLAKE2b-256 b64cc79276b15251d92ebcda5fab8d8e927fdd3f7a850bccc2e0df95e71f9083

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210815170710-cp38-cp38-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210815170710-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 ef161dd415b338707d2156ee43602fe5fa60ca158af1d9df57fbc618ee4103e6
MD5 2ceba25ff11f958906b873c20bf8059b
BLAKE2b-256 363a6ace1a69c7c2b3e3794b0208d8ba3e2231a2bcc6e9c3e7ca123f676cf353

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210815170710-cp37-cp37m-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210815170710-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 84a7040e76d0b53db278553e00054950cc6f9ba0985b4efb1d20a671fb88db28
MD5 ca9ff3884cdba6e1ac950d50d818b400
BLAKE2b-256 a0577134ff3bd46aac27ee572c47c08db4ba21fd63039796db361aefb8effeb3

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210815170710-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210815170710-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 671f9b68d9da2dba7791192638afea156ba1882d5f25bee44c796379c0718bba
MD5 a239743043af02dee7d266f81d821033
BLAKE2b-256 ce49ada2f7eadd12f26b34d7d28a0b593e1446f36a23a3eb5fccb2e80d2d9c5a

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210815170710-cp37-cp37m-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210815170710-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 f8bf96da5aa4b99a90548b1025e1045c73bd0ccb29463313cf82d311c5db589b
MD5 df6af36199ad1ad240d3c25fccc8fed9
BLAKE2b-256 741c4847bd50332b61f97a4b098bf56cabf50bb033cc7db9383c999826427a99

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210815170710-cp36-cp36m-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210815170710-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 ad8b8423cb8a363299790572beb9972741111d888d5f6390fa8a7335ce1a0f16
MD5 937e810af76d47955ce566439fe3ae2f
BLAKE2b-256 9dedb437567d4f5b8b27d25d00eec558698cf21061a40b16d54d58855675a2a9

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210815170710-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210815170710-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 6ac42717d4bf5ace0f2ff036fb5b079bd8ef602a3d96e8299cb1621afc2d658a
MD5 ee8dedc9d337dc4fd3fb6a558cd5f335
BLAKE2b-256 7f619e5792000984d6a47e3ddc49e50cfd6b03c81bdb7f1bdf4c360877eab788

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210815170710-cp36-cp36m-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210815170710-cp36-cp36m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 ae8f9955d13522e193e6f0bf249ee3d7cddf6df81bec69eff2e213a60ab98521
MD5 26d4f4f50d1bec5087297ba498d0f4cc
BLAKE2b-256 53243f1436dfdbfed9da5f140f760701d7e2c6e3b911010c7fce8f1b06df924c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page