Skip to main content

TensorFlow IO

Project description




TensorFlow I/O

GitHub CI PyPI License Documentation

TensorFlow I/O is a collection of file systems and file formats that are not available in TensorFlow's built-in support. A full list of supported file systems and file formats by TensorFlow I/O can be found here.

The use of tensorflow-io is straightforward with keras. Below is an example to Get Started with TensorFlow with the data processing aspect replaced by tensorflow-io:

import tensorflow as tf
import tensorflow_io as tfio

# Read the MNIST data into the IODataset.
dataset_url = "https://storage.googleapis.com/cvdf-datasets/mnist/"
d_train = tfio.IODataset.from_mnist(
    dataset_url + "train-images-idx3-ubyte.gz",
    dataset_url + "train-labels-idx1-ubyte.gz",
)

# Shuffle the elements of the dataset.
d_train = d_train.shuffle(buffer_size=1024)

# By default image data is uint8, so convert to float32 using map().
d_train = d_train.map(lambda x, y: (tf.image.convert_image_dtype(x, tf.float32), y))

# prepare batches the data just like any other tf.data.Dataset
d_train = d_train.batch(32)

# Build the model.
model = tf.keras.models.Sequential(
    [
        tf.keras.layers.Flatten(input_shape=(28, 28)),
        tf.keras.layers.Dense(512, activation=tf.nn.relu),
        tf.keras.layers.Dropout(0.2),
        tf.keras.layers.Dense(10, activation=tf.nn.softmax),
    ]
)

# Compile the model.
model.compile(
    optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"]
)

# Fit the model.
model.fit(d_train, epochs=5, steps_per_epoch=200)

In the above MNIST example, the URL's to access the dataset files are passed directly to the tfio.IODataset.from_mnist API call. This is due to the inherent support that tensorflow-io provides for HTTP/HTTPS file system, thus eliminating the need for downloading and saving datasets on a local directory.

NOTE: Since tensorflow-io is able to detect and uncompress the MNIST dataset automatically if needed, we can pass the URL's for the compressed files (gzip) to the API call as is.

Please check the official documentation for more detailed and interesting usages of the package.

Installation

Python Package

The tensorflow-io Python package can be installed with pip directly using:

$ pip install tensorflow-io

People who are a little more adventurous can also try our nightly binaries:

$ pip install tensorflow-io-nightly

Docker Images

In addition to the pip packages, the docker images can be used to quickly get started.

For stable builds:

$ docker pull tfsigio/tfio:latest
$ docker run -it --rm --name tfio-latest tfsigio/tfio:latest

For nightly builds:

$ docker pull tfsigio/tfio:nightly
$ docker run -it --rm --name tfio-nightly tfsigio/tfio:nightly

R Package

Once the tensorflow-io Python package has been successfully installed, you can install the development version of the R package from GitHub via the following:

if (!require("remotes")) install.packages("remotes")
remotes::install_github("tensorflow/io", subdir = "R-package")

TensorFlow Version Compatibility

To ensure compatibility with TensorFlow, it is recommended to install a matching version of TensorFlow I/O according to the table below. You can find the list of releases here.

TensorFlow I/O Version TensorFlow Compatibility Release Date
0.17.1 2.4.x Apr 16, 2021
0.17.0 2.4.x Dec 14, 2020
0.16.0 2.3.x Oct 23, 2020
0.15.0 2.3.x Aug 03, 2020
0.14.0 2.2.x Jul 08, 2020
0.13.0 2.2.x May 10, 2020
0.12.0 2.1.x Feb 28, 2020
0.11.0 2.1.x Jan 10, 2020
0.10.0 2.0.x Dec 05, 2019
0.9.1 2.0.x Nov 15, 2019
0.9.0 2.0.x Oct 18, 2019
0.8.1 1.15.x Nov 15, 2019
0.8.0 1.15.x Oct 17, 2019
0.7.2 1.14.x Nov 15, 2019
0.7.1 1.14.x Oct 18, 2019
0.7.0 1.14.x Jul 14, 2019
0.6.0 1.13.x May 29, 2019
0.5.0 1.13.x Apr 12, 2019
0.4.0 1.13.x Mar 01, 2019
0.3.0 1.12.0 Feb 15, 2019
0.2.0 1.12.0 Jan 29, 2019
0.1.0 1.12.0 Dec 16, 2018

Performance Benchmarking

We use github-pages to document the results of API performance benchmarks. The benchmark job is triggered on every commit to master branch and facilitates tracking performance w.r.t commits.

Contributing

Tensorflow I/O is a community led open source project. As such, the project depends on public contributions, bug-fixes, and documentation. Please see:

Build Status and CI

Build Status
Linux CPU Python 2 Status
Linux CPU Python 3 Status
Linux GPU Python 2 Status
Linux GPU Python 3 Status

Because of manylinux2010 requirement, TensorFlow I/O is built with Ubuntu:16.04 + Developer Toolset 7 (GCC 7.3) on Linux. Configuration with Ubuntu 16.04 with Developer Toolset 7 is not exactly straightforward. If the system have docker installed, then the following command will automatically build manylinux2010 compatible whl package:

#!/usr/bin/env bash

ls dist/*
for f in dist/*.whl; do
  docker run -i --rm -v $PWD:/v -w /v --net=host quay.io/pypa/manylinux2010_x86_64 bash -x -e /v/tools/build/auditwheel repair --plat manylinux2010_x86_64 $f
done
sudo chown -R $(id -nu):$(id -ng) .
ls wheelhouse/*

It takes some time to build, but once complete, there will be python 3.5, 3.6, 3.7 compatible whl packages available in wheelhouse directory.

On macOS, the same command could be used. However, the script expects python in shell and will only generate a whl package that matches the version of python in shell. If you want to build a whl package for a specific python then you have to alias this version of python to python in shell. See .github/workflows/build.yml Auditwheel step for instructions how to do that.

Note the above command is also the command we use when releasing packages for Linux and macOS.

TensorFlow I/O uses both GitHub Workflows and Google CI (Kokoro) for continuous integration. GitHub Workflows is used for macOS build and test. Kokoro is used for Linux build and test. Again, because of the manylinux2010 requirement, on Linux whl packages are always built with Ubuntu 16.04 + Developer Toolset 7. Tests are done on a variatiy of systems with different python3 versions to ensure a good coverage:

Python Ubuntu 18.04 Ubuntu 20.04 macOS + osx9 Windows-2019
2.7 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: N/A
3.7 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:
3.8 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:

TensorFlow I/O has integrations with many systems and cloud vendors such as Prometheus, Apache Kafka, Apache Ignite, Google Cloud PubSub, AWS Kinesis, Microsoft Azure Storage, Alibaba Cloud OSS etc.

We tried our best to test against those systems in our continuous integration whenever possible. Some tests such as Prometheus, Kafka, and Ignite are done with live systems, meaning we install Prometheus/Kafka/Ignite on CI machine before the test is run. Some tests such as Kinesis, PubSub, and Azure Storage are done through official or non-official emulators. Offline tests are also performed whenever possible, though systems covered through offine tests may not have the same level of coverage as live systems or emulators.

Live System Emulator CI Integration Offline
Apache Kafka :heavy_check_mark: :heavy_check_mark:
Apache Ignite :heavy_check_mark: :heavy_check_mark:
Prometheus :heavy_check_mark: :heavy_check_mark:
Google PubSub :heavy_check_mark: :heavy_check_mark:
Azure Storage :heavy_check_mark: :heavy_check_mark:
AWS Kinesis :heavy_check_mark: :heavy_check_mark:
Alibaba Cloud OSS :heavy_check_mark:
Google BigTable/BigQuery to be added
Elasticsearch (experimental) :heavy_check_mark: :heavy_check_mark:
MongoDB (experimental) :heavy_check_mark: :heavy_check_mark:

References for emulators:

Community

Additional Information

License

Apache License 2.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210506015400-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210506015400-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 a4b9d225f53c074cc533cb63516e9475d73031fc2754efefa2f30692ce54268e
MD5 07f864abf64f38174b9f2e5b200093a3
BLAKE2b-256 b07ec2310ef0d078353ea1b24820042e53fb3fa354679dbf9251e2e034d6f2b6

See more details on using hashes here.

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210506015400-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210506015400-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 e3a98e519bb5eb379e2c57adf70432294d5bf1ff556afc6455c242ab0018bc8f
MD5 710a98053d03313d6d586f4cde8b89f4
BLAKE2b-256 8e29b90b0ef8e70ea16e73886754204a45a7f56fe57ebea067713fde63b2cdb4

See more details on using hashes here.

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210506015400-cp39-cp39-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210506015400-cp39-cp39-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 5950833f8a9d752128c31f9bb60dfae2d9f068c381326b21c1bee86097247b1e
MD5 bc312162edffbd602c902bf3ddb12755
BLAKE2b-256 d172b08e5e4d0dd8782804ed70bb3a95f53b19bc4439e39717bbfd6e002f5976

See more details on using hashes here.

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210506015400-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210506015400-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 90ed206c74ea845f57050fea71dee9390e7ca5fabf4e6375cf00adf10f9a18ff
MD5 76230eaf59123ff512bd685b267a295d
BLAKE2b-256 3a77b3c7bd9a15f62ce102ed1cf32038d5fb9c2d7933be08dcf9627178a683d3

See more details on using hashes here.

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210506015400-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210506015400-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 4a6827fd7307d7aaa918c5eef4c42bfe79d6337ded14ced0c00a44bb22403458
MD5 83023f6b1e1b1387f61b9ec707b248e4
BLAKE2b-256 f5bf2071b00f4fa2a99d404ba505319d9063f00865ed9c1537570d44e9d3c980

See more details on using hashes here.

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210506015400-cp38-cp38-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210506015400-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 34a9bcf45bb41e8771295e23d3500df6798505ad3778a3973806534ca5e679b3
MD5 77b09194ecc74cfd7429ef4f727a151c
BLAKE2b-256 5c7666beb359baad6c5dd171193c36af0f65cac0263a9b88cf4f23583d411f0b

See more details on using hashes here.

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210506015400-cp37-cp37m-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210506015400-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 82d0534ff7b497738adc0b62e2a71a7bb2ce75c566b92440dee89762e8b8cbb6
MD5 52e6bf20d1b13578a1190d57e855b08f
BLAKE2b-256 0488b18bab7281d952b49fb13ec74008a9d807d2b8cf0550d9d40e1103e439ea

See more details on using hashes here.

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210506015400-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210506015400-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 baebcd6971de099ff5b88a9edf1fef783a5dd4f14dca1c860f3b9a5a46ddd30f
MD5 e1fedc51858c86f7f3888ea95171d638
BLAKE2b-256 62154b2e6ff67724fd6e0a13b1ccb6a8f5418e2e8b2b87d6525526d6649b0c01

See more details on using hashes here.

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210506015400-cp37-cp37m-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210506015400-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 ed69f5aad522e0748c79a936f3e382c83dc3ed71e5a38e0f5f8b1034c7668518
MD5 0bfbee31f3d22e54e3b8de1b54cb6d40
BLAKE2b-256 57a0bf4f3869ac0b586255de9f5b24717bcdad99eca4978b3dc516b4f0345fe4

See more details on using hashes here.

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210506015400-cp36-cp36m-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210506015400-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 8f291e7fc006870d73c5c08c5a3180529a24f89e3d8d19c23de9d0d56362d369
MD5 3113d8ad738c0fe6bed5986717b6a196
BLAKE2b-256 cbfa28904ff229ffbabd577d08e1d03cd7566c5fa571fc4f3fe130c976f7aa84

See more details on using hashes here.

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210506015400-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210506015400-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 1347bb8d2142ab12912619f8cc109a24a36db6baa615990052cd6e7eb22886d0
MD5 bbb65f604bec105a53763d4bed9a5dd1
BLAKE2b-256 d5c20a11aedc1f85e279ef194ccb3478f989a906ffb1d78fb10fc8fae998c0aa

See more details on using hashes here.

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210506015400-cp36-cp36m-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210506015400-cp36-cp36m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 cbf6523cb5c51872fc37cb410a28dab0309dd493c085b5c0eb47391c7bfc33ea
MD5 d73a59c2fbecb3431e02556e7c2db6ce
BLAKE2b-256 adacc249d5c0687a599d7f725f8979df87a2b22dcf7f89306f9d1e0ae2b8c094

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page