Skip to main content

TensorFlow IO

Project description




TensorFlow I/O

GitHub CI PyPI License Documentation

TensorFlow I/O is a collection of file systems and file formats that are not available in TensorFlow's built-in support. A full list of supported file systems and file formats by TensorFlow I/O can be found here.

The use of tensorflow-io is straightforward with keras. Below is an example to Get Started with TensorFlow with the data processing aspect replaced by tensorflow-io:

import tensorflow as tf
import tensorflow_io as tfio

# Read the MNIST data into the IODataset.
dataset_url = "https://storage.googleapis.com/cvdf-datasets/mnist/"
d_train = tfio.IODataset.from_mnist(
    dataset_url + "train-images-idx3-ubyte.gz",
    dataset_url + "train-labels-idx1-ubyte.gz",
)

# Shuffle the elements of the dataset.
d_train = d_train.shuffle(buffer_size=1024)

# By default image data is uint8, so convert to float32 using map().
d_train = d_train.map(lambda x, y: (tf.image.convert_image_dtype(x, tf.float32), y))

# prepare batches the data just like any other tf.data.Dataset
d_train = d_train.batch(32)

# Build the model.
model = tf.keras.models.Sequential(
    [
        tf.keras.layers.Flatten(input_shape=(28, 28)),
        tf.keras.layers.Dense(512, activation=tf.nn.relu),
        tf.keras.layers.Dropout(0.2),
        tf.keras.layers.Dense(10, activation=tf.nn.softmax),
    ]
)

# Compile the model.
model.compile(
    optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"]
)

# Fit the model.
model.fit(d_train, epochs=5, steps_per_epoch=200)

In the above MNIST example, the URL's to access the dataset files are passed directly to the tfio.IODataset.from_mnist API call. This is due to the inherent support that tensorflow-io provides for HTTP/HTTPS file system, thus eliminating the need for downloading and saving datasets on a local directory.

NOTE: Since tensorflow-io is able to detect and uncompress the MNIST dataset automatically if needed, we can pass the URL's for the compressed files (gzip) to the API call as is.

Please check the official documentation for more detailed and interesting usages of the package.

Installation

Python Package

The tensorflow-io Python package can be installed with pip directly using:

$ pip install tensorflow-io

People who are a little more adventurous can also try our nightly binaries:

$ pip install tensorflow-io-nightly

Docker Images

In addition to the pip packages, the docker images can be used to quickly get started.

For stable builds:

$ docker pull tfsigio/tfio:latest
$ docker run -it --rm --name tfio-latest tfsigio/tfio:latest

For nightly builds:

$ docker pull tfsigio/tfio:nightly
$ docker run -it --rm --name tfio-nightly tfsigio/tfio:nightly

R Package

Once the tensorflow-io Python package has been successfully installed, you can install the development version of the R package from GitHub via the following:

if (!require("remotes")) install.packages("remotes")
remotes::install_github("tensorflow/io", subdir = "R-package")

TensorFlow Version Compatibility

To ensure compatibility with TensorFlow, it is recommended to install a matching version of TensorFlow I/O according to the table below. You can find the list of releases here.

TensorFlow I/O Version TensorFlow Compatibility Release Date
0.18.0 2.5.x May 13, 2021
0.17.1 2.4.x Apr 16, 2021
0.17.0 2.4.x Dec 14, 2020
0.16.0 2.3.x Oct 23, 2020
0.15.0 2.3.x Aug 03, 2020
0.14.0 2.2.x Jul 08, 2020
0.13.0 2.2.x May 10, 2020
0.12.0 2.1.x Feb 28, 2020
0.11.0 2.1.x Jan 10, 2020
0.10.0 2.0.x Dec 05, 2019
0.9.1 2.0.x Nov 15, 2019
0.9.0 2.0.x Oct 18, 2019
0.8.1 1.15.x Nov 15, 2019
0.8.0 1.15.x Oct 17, 2019
0.7.2 1.14.x Nov 15, 2019
0.7.1 1.14.x Oct 18, 2019
0.7.0 1.14.x Jul 14, 2019
0.6.0 1.13.x May 29, 2019
0.5.0 1.13.x Apr 12, 2019
0.4.0 1.13.x Mar 01, 2019
0.3.0 1.12.0 Feb 15, 2019
0.2.0 1.12.0 Jan 29, 2019
0.1.0 1.12.0 Dec 16, 2018

Performance Benchmarking

We use github-pages to document the results of API performance benchmarks. The benchmark job is triggered on every commit to master branch and facilitates tracking performance w.r.t commits.

Contributing

Tensorflow I/O is a community led open source project. As such, the project depends on public contributions, bug-fixes, and documentation. Please see:

Build Status and CI

Build Status
Linux CPU Python 2 Status
Linux CPU Python 3 Status
Linux GPU Python 2 Status
Linux GPU Python 3 Status

Because of manylinux2010 requirement, TensorFlow I/O is built with Ubuntu:16.04 + Developer Toolset 7 (GCC 7.3) on Linux. Configuration with Ubuntu 16.04 with Developer Toolset 7 is not exactly straightforward. If the system have docker installed, then the following command will automatically build manylinux2010 compatible whl package:

#!/usr/bin/env bash

ls dist/*
for f in dist/*.whl; do
  docker run -i --rm -v $PWD:/v -w /v --net=host quay.io/pypa/manylinux2010_x86_64 bash -x -e /v/tools/build/auditwheel repair --plat manylinux2010_x86_64 $f
done
sudo chown -R $(id -nu):$(id -ng) .
ls wheelhouse/*

It takes some time to build, but once complete, there will be python 3.5, 3.6, 3.7 compatible whl packages available in wheelhouse directory.

On macOS, the same command could be used. However, the script expects python in shell and will only generate a whl package that matches the version of python in shell. If you want to build a whl package for a specific python then you have to alias this version of python to python in shell. See .github/workflows/build.yml Auditwheel step for instructions how to do that.

Note the above command is also the command we use when releasing packages for Linux and macOS.

TensorFlow I/O uses both GitHub Workflows and Google CI (Kokoro) for continuous integration. GitHub Workflows is used for macOS build and test. Kokoro is used for Linux build and test. Again, because of the manylinux2010 requirement, on Linux whl packages are always built with Ubuntu 16.04 + Developer Toolset 7. Tests are done on a variatiy of systems with different python3 versions to ensure a good coverage:

Python Ubuntu 18.04 Ubuntu 20.04 macOS + osx9 Windows-2019
2.7 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: N/A
3.7 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:
3.8 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:

TensorFlow I/O has integrations with many systems and cloud vendors such as Prometheus, Apache Kafka, Apache Ignite, Google Cloud PubSub, AWS Kinesis, Microsoft Azure Storage, Alibaba Cloud OSS etc.

We tried our best to test against those systems in our continuous integration whenever possible. Some tests such as Prometheus, Kafka, and Ignite are done with live systems, meaning we install Prometheus/Kafka/Ignite on CI machine before the test is run. Some tests such as Kinesis, PubSub, and Azure Storage are done through official or non-official emulators. Offline tests are also performed whenever possible, though systems covered through offine tests may not have the same level of coverage as live systems or emulators.

Live System Emulator CI Integration Offline
Apache Kafka :heavy_check_mark: :heavy_check_mark:
Apache Ignite :heavy_check_mark: :heavy_check_mark:
Prometheus :heavy_check_mark: :heavy_check_mark:
Google PubSub :heavy_check_mark: :heavy_check_mark:
Azure Storage :heavy_check_mark: :heavy_check_mark:
AWS Kinesis :heavy_check_mark: :heavy_check_mark:
Alibaba Cloud OSS :heavy_check_mark:
Google BigTable/BigQuery to be added
Elasticsearch (experimental) :heavy_check_mark: :heavy_check_mark:
MongoDB (experimental) :heavy_check_mark: :heavy_check_mark:

References for emulators:

Community

Additional Information

License

Apache License 2.0

Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210528075532-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210528075532-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 e32a49109909057218bb7e93c0d5d6cc0de18ae868b0aaffe2d871a338b9b2a0
MD5 0feba763d3a7c70bdf2dd30c65696c02
BLAKE2b-256 052d23da212a83e7ff37c24e8ded1f6e9282ae915e40dc3e045cac25dbca5e69

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210528075532-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210528075532-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 0f89d6b2cd2bd139d8d57ce1af35048491e7d6855add1cdfd791c19cf9b03bd5
MD5 cd2a147d5c98e1b13cf0d4da30567738
BLAKE2b-256 9a6c67548e329fc2eaf1867c44e8578c7ed2ef2b31d5c1854e2f14e11e1cd681

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210528075532-cp39-cp39-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210528075532-cp39-cp39-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 b57e7fef63fac25405bc759f87dff4058f6ec6a1d2d180822dbce4b195995748
MD5 e7585df6c60d4c0b1e2e8addcf2885fe
BLAKE2b-256 f41bc73ac8a96eb2b1bec8319de3477fd9deb9b3980138042586f54c4df2eee5

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210528075532-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210528075532-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 921358ea80cdf4c9bf23c1ca32d57a932655da195ff9826205629ca8e729a999
MD5 80afbcfcb0747a59a25ccc6329dc87b4
BLAKE2b-256 081f1253f42f522aeec43ab9858b5a92902d42d6e6cf384cbe2892bb3f7621eb

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210528075532-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210528075532-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 f8e4154c3d9c552ac582f97d58cd1718d77b3424c94ef15330d824e3134f1f67
MD5 5e063b8b444472197bf330472c8e1501
BLAKE2b-256 13b8950cae846e73bc4975836a3ea1bb6179c8c5abdeaed18d317a9863f3d73f

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210528075532-cp38-cp38-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210528075532-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 6fdf182d29f0107e9f695d5b5af80fac89eb41ac9121620d5e51ea99f98a7dce
MD5 cb6284b3ca7c3ea9cfd0fdac3e080e16
BLAKE2b-256 5cce4f2b1de8fe5e55db29c4acf04237afed05022b3a8fcee3273d696c4c4791

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210528075532-cp37-cp37m-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210528075532-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 314c302c4a7b57742fb62c9ce56c548d1ed69fafa5b0552aeee901378f928830
MD5 6afa2c98f1f7e3e59afe47208e3aaa2a
BLAKE2b-256 d76420fcb2f8ac905e146db3588f8d767d838e1c09b8cc2a35e6c132a5d6f2f3

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210528075532-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210528075532-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 1868c0df7d957af1acf526258a43ed6b80ebdb1eb2c3ad82c25e5083d6699e49
MD5 59a33e3b9f15cedae7b559f1f97ae274
BLAKE2b-256 284708c85010aef06f13c9e1c2cf85d5f9ed74b851fb5e486ed1fe2540fd39a7

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210528075532-cp37-cp37m-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210528075532-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 cb3cf5938b772d98bdfacc3e708b32e7729d931eb674ea4c30b0fc7ddce59a5a
MD5 82b8d3a2dc00b6897f7a2c929331bd6a
BLAKE2b-256 ff67977de9b8e2461647c0cb0f46673110292cf16a5927557d813d0b42066601

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210528075532-cp36-cp36m-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210528075532-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 72c59fadbf8fd85301c1cb014ceebc945a6b25342c56291524bf2ba642af85ef
MD5 cf7fa4e74cfb2b8b12041595e341c889
BLAKE2b-256 9995936164403b503556c958ee21dc2b48a201ca1aa0a1d7465ed16281d1e62c

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210528075532-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210528075532-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 dbc406ad6ae17635705b48f7cfea2cb7cdfcc35483c16372a0bf77aa0d4328c4
MD5 f2d056035d7c6e1dee901ee58fb6b51c
BLAKE2b-256 8c0164d8c4edb8f98eaa740c2e2efe68ee6412fac3bde79411bcfb6dcb26385e

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210528075532-cp36-cp36m-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210528075532-cp36-cp36m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 783e9701da6c35a7f47ffcb7b55ba49732b76e0dafaa0d19d03b553d3468b753
MD5 c6a16da4c24c47457a42196620084d4c
BLAKE2b-256 488191f476340ea2e16357d002facbb058c315d3f4de3d329bc1761b359bad63

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page