Skip to main content

TensorFlow IO

Project description




TensorFlow I/O

GitHub CI PyPI License Documentation

TensorFlow I/O is a collection of file systems and file formats that are not available in TensorFlow's built-in support. A full list of supported file systems and file formats by TensorFlow I/O can be found here.

The use of tensorflow-io is straightforward with keras. Below is an example to Get Started with TensorFlow with the data processing aspect replaced by tensorflow-io:

import tensorflow as tf
import tensorflow_io as tfio

# Read the MNIST data into the IODataset.
dataset_url = "https://storage.googleapis.com/cvdf-datasets/mnist/"
d_train = tfio.IODataset.from_mnist(
    dataset_url + "train-images-idx3-ubyte.gz",
    dataset_url + "train-labels-idx1-ubyte.gz",
)

# Shuffle the elements of the dataset.
d_train = d_train.shuffle(buffer_size=1024)

# By default image data is uint8, so convert to float32 using map().
d_train = d_train.map(lambda x, y: (tf.image.convert_image_dtype(x, tf.float32), y))

# prepare batches the data just like any other tf.data.Dataset
d_train = d_train.batch(32)

# Build the model.
model = tf.keras.models.Sequential(
    [
        tf.keras.layers.Flatten(input_shape=(28, 28)),
        tf.keras.layers.Dense(512, activation=tf.nn.relu),
        tf.keras.layers.Dropout(0.2),
        tf.keras.layers.Dense(10, activation=tf.nn.softmax),
    ]
)

# Compile the model.
model.compile(
    optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"]
)

# Fit the model.
model.fit(d_train, epochs=5, steps_per_epoch=200)

In the above MNIST example, the URL's to access the dataset files are passed directly to the tfio.IODataset.from_mnist API call. This is due to the inherent support that tensorflow-io provides for HTTP/HTTPS file system, thus eliminating the need for downloading and saving datasets on a local directory.

NOTE: Since tensorflow-io is able to detect and uncompress the MNIST dataset automatically if needed, we can pass the URL's for the compressed files (gzip) to the API call as is.

Please check the official documentation for more detailed and interesting usages of the package.

Installation

Python Package

The tensorflow-io Python package can be installed with pip directly using:

$ pip install tensorflow-io

People who are a little more adventurous can also try our nightly binaries:

$ pip install tensorflow-io-nightly

Docker Images

In addition to the pip packages, the docker images can be used to quickly get started.

For stable builds:

$ docker pull tfsigio/tfio:latest
$ docker run -it --rm --name tfio-latest tfsigio/tfio:latest

For nightly builds:

$ docker pull tfsigio/tfio:nightly
$ docker run -it --rm --name tfio-nightly tfsigio/tfio:nightly

R Package

Once the tensorflow-io Python package has been successfully installed, you can install the development version of the R package from GitHub via the following:

if (!require("remotes")) install.packages("remotes")
remotes::install_github("tensorflow/io", subdir = "R-package")

TensorFlow Version Compatibility

To ensure compatibility with TensorFlow, it is recommended to install a matching version of TensorFlow I/O according to the table below. You can find the list of releases here.

TensorFlow I/O Version TensorFlow Compatibility Release Date
0.18.0 2.5.x May 13, 2021
0.17.1 2.4.x Apr 16, 2021
0.17.0 2.4.x Dec 14, 2020
0.16.0 2.3.x Oct 23, 2020
0.15.0 2.3.x Aug 03, 2020
0.14.0 2.2.x Jul 08, 2020
0.13.0 2.2.x May 10, 2020
0.12.0 2.1.x Feb 28, 2020
0.11.0 2.1.x Jan 10, 2020
0.10.0 2.0.x Dec 05, 2019
0.9.1 2.0.x Nov 15, 2019
0.9.0 2.0.x Oct 18, 2019
0.8.1 1.15.x Nov 15, 2019
0.8.0 1.15.x Oct 17, 2019
0.7.2 1.14.x Nov 15, 2019
0.7.1 1.14.x Oct 18, 2019
0.7.0 1.14.x Jul 14, 2019
0.6.0 1.13.x May 29, 2019
0.5.0 1.13.x Apr 12, 2019
0.4.0 1.13.x Mar 01, 2019
0.3.0 1.12.0 Feb 15, 2019
0.2.0 1.12.0 Jan 29, 2019
0.1.0 1.12.0 Dec 16, 2018

Performance Benchmarking

We use github-pages to document the results of API performance benchmarks. The benchmark job is triggered on every commit to master branch and facilitates tracking performance w.r.t commits.

Contributing

Tensorflow I/O is a community led open source project. As such, the project depends on public contributions, bug-fixes, and documentation. Please see:

Build Status and CI

Build Status
Linux CPU Python 2 Status
Linux CPU Python 3 Status
Linux GPU Python 2 Status
Linux GPU Python 3 Status

Because of manylinux2010 requirement, TensorFlow I/O is built with Ubuntu:16.04 + Developer Toolset 7 (GCC 7.3) on Linux. Configuration with Ubuntu 16.04 with Developer Toolset 7 is not exactly straightforward. If the system have docker installed, then the following command will automatically build manylinux2010 compatible whl package:

#!/usr/bin/env bash

ls dist/*
for f in dist/*.whl; do
  docker run -i --rm -v $PWD:/v -w /v --net=host quay.io/pypa/manylinux2010_x86_64 bash -x -e /v/tools/build/auditwheel repair --plat manylinux2010_x86_64 $f
done
sudo chown -R $(id -nu):$(id -ng) .
ls wheelhouse/*

It takes some time to build, but once complete, there will be python 3.5, 3.6, 3.7 compatible whl packages available in wheelhouse directory.

On macOS, the same command could be used. However, the script expects python in shell and will only generate a whl package that matches the version of python in shell. If you want to build a whl package for a specific python then you have to alias this version of python to python in shell. See .github/workflows/build.yml Auditwheel step for instructions how to do that.

Note the above command is also the command we use when releasing packages for Linux and macOS.

TensorFlow I/O uses both GitHub Workflows and Google CI (Kokoro) for continuous integration. GitHub Workflows is used for macOS build and test. Kokoro is used for Linux build and test. Again, because of the manylinux2010 requirement, on Linux whl packages are always built with Ubuntu 16.04 + Developer Toolset 7. Tests are done on a variatiy of systems with different python3 versions to ensure a good coverage:

Python Ubuntu 18.04 Ubuntu 20.04 macOS + osx9 Windows-2019
2.7 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: N/A
3.7 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:
3.8 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:

TensorFlow I/O has integrations with many systems and cloud vendors such as Prometheus, Apache Kafka, Apache Ignite, Google Cloud PubSub, AWS Kinesis, Microsoft Azure Storage, Alibaba Cloud OSS etc.

We tried our best to test against those systems in our continuous integration whenever possible. Some tests such as Prometheus, Kafka, and Ignite are done with live systems, meaning we install Prometheus/Kafka/Ignite on CI machine before the test is run. Some tests such as Kinesis, PubSub, and Azure Storage are done through official or non-official emulators. Offline tests are also performed whenever possible, though systems covered through offine tests may not have the same level of coverage as live systems or emulators.

Live System Emulator CI Integration Offline
Apache Kafka :heavy_check_mark: :heavy_check_mark:
Apache Ignite :heavy_check_mark: :heavy_check_mark:
Prometheus :heavy_check_mark: :heavy_check_mark:
Google PubSub :heavy_check_mark: :heavy_check_mark:
Azure Storage :heavy_check_mark: :heavy_check_mark:
AWS Kinesis :heavy_check_mark: :heavy_check_mark:
Alibaba Cloud OSS :heavy_check_mark:
Google BigTable/BigQuery to be added
Elasticsearch (experimental) :heavy_check_mark: :heavy_check_mark:
MongoDB (experimental) :heavy_check_mark: :heavy_check_mark:

References for emulators:

Community

Additional Information

License

Apache License 2.0

Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210601134247-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210601134247-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 cfe5e815f16c14313f129db48d7b226351ac60684fd1c8978400e469e2799c06
MD5 61ebea4a39e656fe92ae36e288962d70
BLAKE2b-256 e4170461d0d34fcbc3fe11721ce08d0d90e1479d917adbf7e81161e4dcb2166d

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210601134247-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210601134247-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 65be06511c9cbf3ad57586a192f844fe4207ccca5c91071d40144adf2c9758b1
MD5 73e3974ee7d53834a4a112234f6a5329
BLAKE2b-256 bc8e013e46a75b578d291b043cbd67cfe6922dcc0a5f102b2f557f75081e907d

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210601134247-cp39-cp39-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210601134247-cp39-cp39-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 df5b111c7e56c3a2ccfbbc4cd0bd8e96f63663d2b640ace0b42c20d5f583194c
MD5 c6168183952d122640edc541549329e0
BLAKE2b-256 7259fc20c3f73845b0b3eb9d80f273b81aedc6593727ae3ea97f0edca5f4958f

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210601134247-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210601134247-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 a0961a7f42d4a81b5bc3a697ef2e7a55d32a70a438bf0ec4e3fc38fa70a02058
MD5 8516ada7221adfa9f1f7841d1a9eb529
BLAKE2b-256 02faabff7b00c663b8cda1b3a3c9591f57f30e89d8743b9e3e9a3abc9670b6a2

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210601134247-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210601134247-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 573c5e8de75c7d2d72bf85d0e07efd22746cad06181a96af7452cf1c8c24b1a6
MD5 240e783a78292a41b7dc9c62d80d1e1e
BLAKE2b-256 3efa17aa5ffafa6ef718f4dc26fc669780059708087eb9f039f053640f7d8e7c

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210601134247-cp38-cp38-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210601134247-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 a74162ab339812d361de288e5d8f1eaaf6fa60a6a2265466b24aacbb7eedc02e
MD5 fcf05afbebc91827e4ab3a1d2d1f57c7
BLAKE2b-256 3cbd8fe873c34246a74b81858c4734f12f862ceadd0c771a09d8b86a93bec309

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210601134247-cp37-cp37m-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210601134247-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 a24d81b802030476ae6ac6e9f0e848b48364e8c1d958193ffa564cf593a17456
MD5 48623f2b988243ef7e0b4888bf9d75b1
BLAKE2b-256 2ea830c4b870aaa0eb0a898cb04f76dce4bc3112a37ed7ade477f0057a960b7a

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210601134247-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210601134247-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 08f15ef9010e84ea01d9af32eca1a2842f4356b1a581d774b467d8ee5bae4969
MD5 38c81e1e2466fd47251b98fd2580a704
BLAKE2b-256 759a5845d0a5661ed1572404c202059cd9950b425754c460276f88b71b9c9603

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210601134247-cp37-cp37m-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210601134247-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 00c549e7fc1d4aefac9cd000964f388d667bed3a9d93abb3a1c1a0840487726d
MD5 bb01774117231fb18068251ae24b883c
BLAKE2b-256 56653cd0e897a0e4eef2a390c31edadd40d722e8b4c575be3be7dfbc4e71bf86

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210601134247-cp36-cp36m-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210601134247-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 0909f1c873e72060ebd69508bd72767f709da2d148977a8708f89feda149314a
MD5 3a29a089250b3d6b8f673d5a08ec8b82
BLAKE2b-256 b78c06762b60a93935e02cbbdad4235f4b0a787181e87f31b6ac0fe69ff39b58

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210601134247-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210601134247-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 6af6364d86e9f442913bca460f7f54f59cec9ec6fc87f731d3206f187bc1c0f7
MD5 f6b92a43f8f3f3a0342f5bb3bba23883
BLAKE2b-256 7ea22afff729f6805516a87e084ca92fa0468e925df2adea8b98841103f20d38

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210601134247-cp36-cp36m-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210601134247-cp36-cp36m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 69b5d8003f1f6660c390f9f41810423548d043f128e82639a11904c58c39ef36
MD5 8919250b534faa851a3efd2cd13febe9
BLAKE2b-256 bb4f181ac1e6d7c6a07b8139816478970ce94b715e9224960dd12f1cfa8c52d1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page