Skip to main content

TensorFlow IO

Project description




TensorFlow I/O

GitHub CI PyPI License Documentation

TensorFlow I/O is a collection of file systems and file formats that are not available in TensorFlow's built-in support. A full list of supported file systems and file formats by TensorFlow I/O can be found here.

The use of tensorflow-io is straightforward with keras. Below is an example to Get Started with TensorFlow with the data processing aspect replaced by tensorflow-io:

import tensorflow as tf
import tensorflow_io as tfio

# Read the MNIST data into the IODataset.
dataset_url = "https://storage.googleapis.com/cvdf-datasets/mnist/"
d_train = tfio.IODataset.from_mnist(
    dataset_url + "train-images-idx3-ubyte.gz",
    dataset_url + "train-labels-idx1-ubyte.gz",
)

# Shuffle the elements of the dataset.
d_train = d_train.shuffle(buffer_size=1024)

# By default image data is uint8, so convert to float32 using map().
d_train = d_train.map(lambda x, y: (tf.image.convert_image_dtype(x, tf.float32), y))

# prepare batches the data just like any other tf.data.Dataset
d_train = d_train.batch(32)

# Build the model.
model = tf.keras.models.Sequential(
    [
        tf.keras.layers.Flatten(input_shape=(28, 28)),
        tf.keras.layers.Dense(512, activation=tf.nn.relu),
        tf.keras.layers.Dropout(0.2),
        tf.keras.layers.Dense(10, activation=tf.nn.softmax),
    ]
)

# Compile the model.
model.compile(
    optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"]
)

# Fit the model.
model.fit(d_train, epochs=5, steps_per_epoch=200)

In the above MNIST example, the URL's to access the dataset files are passed directly to the tfio.IODataset.from_mnist API call. This is due to the inherent support that tensorflow-io provides for HTTP/HTTPS file system, thus eliminating the need for downloading and saving datasets on a local directory.

NOTE: Since tensorflow-io is able to detect and uncompress the MNIST dataset automatically if needed, we can pass the URL's for the compressed files (gzip) to the API call as is.

Please check the official documentation for more detailed and interesting usages of the package.

Installation

Python Package

The tensorflow-io Python package can be installed with pip directly using:

$ pip install tensorflow-io

People who are a little more adventurous can also try our nightly binaries:

$ pip install tensorflow-io-nightly

To ensure you have a version of TensorFlow that is compatible with TensorFlow-IO, you can specify the tensorflow extra requirement during install:

pip install tensorflow-io[tensorflow]

Similar extras exist for the tensorflow-gpu, tensorflow-cpu and tensorflow-rocm packages.

Docker Images

In addition to the pip packages, the docker images can be used to quickly get started.

For stable builds:

$ docker pull tfsigio/tfio:latest
$ docker run -it --rm --name tfio-latest tfsigio/tfio:latest

For nightly builds:

$ docker pull tfsigio/tfio:nightly
$ docker run -it --rm --name tfio-nightly tfsigio/tfio:nightly

R Package

Once the tensorflow-io Python package has been successfully installed, you can install the development version of the R package from GitHub via the following:

if (!require("remotes")) install.packages("remotes")
remotes::install_github("tensorflow/io", subdir = "R-package")

TensorFlow Version Compatibility

To ensure compatibility with TensorFlow, it is recommended to install a matching version of TensorFlow I/O according to the table below. You can find the list of releases here.

TensorFlow I/O Version TensorFlow Compatibility Release Date
0.24.0 2.8.x Feb 04, 2022
0.23.1 2.7.x Dec 15, 2021
0.23.0 2.7.x Dec 14, 2021
0.22.0 2.7.x Nov 10, 2021
0.21.0 2.6.x Sep 12, 2021
0.20.0 2.6.x Aug 11, 2021
0.19.1 2.5.x Jul 25, 2021
0.19.0 2.5.x Jun 25, 2021
0.18.0 2.5.x May 13, 2021
0.17.1 2.4.x Apr 16, 2021
0.17.0 2.4.x Dec 14, 2020
0.16.0 2.3.x Oct 23, 2020
0.15.0 2.3.x Aug 03, 2020
0.14.0 2.2.x Jul 08, 2020
0.13.0 2.2.x May 10, 2020
0.12.0 2.1.x Feb 28, 2020
0.11.0 2.1.x Jan 10, 2020
0.10.0 2.0.x Dec 05, 2019
0.9.1 2.0.x Nov 15, 2019
0.9.0 2.0.x Oct 18, 2019
0.8.1 1.15.x Nov 15, 2019
0.8.0 1.15.x Oct 17, 2019
0.7.2 1.14.x Nov 15, 2019
0.7.1 1.14.x Oct 18, 2019
0.7.0 1.14.x Jul 14, 2019
0.6.0 1.13.x May 29, 2019
0.5.0 1.13.x Apr 12, 2019
0.4.0 1.13.x Mar 01, 2019
0.3.0 1.12.0 Feb 15, 2019
0.2.0 1.12.0 Jan 29, 2019
0.1.0 1.12.0 Dec 16, 2018

Performance Benchmarking

We use github-pages to document the results of API performance benchmarks. The benchmark job is triggered on every commit to master branch and facilitates tracking performance w.r.t commits.

Contributing

Tensorflow I/O is a community led open source project. As such, the project depends on public contributions, bug-fixes, and documentation. Please see:

Build Status and CI

Build Status
Linux CPU Python 2 Status
Linux CPU Python 3 Status
Linux GPU Python 2 Status
Linux GPU Python 3 Status

Because of manylinux2010 requirement, TensorFlow I/O is built with Ubuntu:16.04 + Developer Toolset 7 (GCC 7.3) on Linux. Configuration with Ubuntu 16.04 with Developer Toolset 7 is not exactly straightforward. If the system have docker installed, then the following command will automatically build manylinux2010 compatible whl package:

#!/usr/bin/env bash

ls dist/*
for f in dist/*.whl; do
  docker run -i --rm -v $PWD:/v -w /v --net=host quay.io/pypa/manylinux2010_x86_64 bash -x -e /v/tools/build/auditwheel repair --plat manylinux2010_x86_64 $f
done
sudo chown -R $(id -nu):$(id -ng) .
ls wheelhouse/*

It takes some time to build, but once complete, there will be python 3.5, 3.6, 3.7 compatible whl packages available in wheelhouse directory.

On macOS, the same command could be used. However, the script expects python in shell and will only generate a whl package that matches the version of python in shell. If you want to build a whl package for a specific python then you have to alias this version of python to python in shell. See .github/workflows/build.yml Auditwheel step for instructions how to do that.

Note the above command is also the command we use when releasing packages for Linux and macOS.

TensorFlow I/O uses both GitHub Workflows and Google CI (Kokoro) for continuous integration. GitHub Workflows is used for macOS build and test. Kokoro is used for Linux build and test. Again, because of the manylinux2010 requirement, on Linux whl packages are always built with Ubuntu 16.04 + Developer Toolset 7. Tests are done on a variatiy of systems with different python3 versions to ensure a good coverage:

Python Ubuntu 18.04 Ubuntu 20.04 macOS + osx9 Windows-2019
2.7 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: N/A
3.7 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:
3.8 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:

TensorFlow I/O has integrations with many systems and cloud vendors such as Prometheus, Apache Kafka, Apache Ignite, Google Cloud PubSub, AWS Kinesis, Microsoft Azure Storage, Alibaba Cloud OSS etc.

We tried our best to test against those systems in our continuous integration whenever possible. Some tests such as Prometheus, Kafka, and Ignite are done with live systems, meaning we install Prometheus/Kafka/Ignite on CI machine before the test is run. Some tests such as Kinesis, PubSub, and Azure Storage are done through official or non-official emulators. Offline tests are also performed whenever possible, though systems covered through offine tests may not have the same level of coverage as live systems or emulators.

Live System Emulator CI Integration Offline
Apache Kafka :heavy_check_mark: :heavy_check_mark:
Apache Ignite :heavy_check_mark: :heavy_check_mark:
Prometheus :heavy_check_mark: :heavy_check_mark:
Google PubSub :heavy_check_mark: :heavy_check_mark:
Azure Storage :heavy_check_mark: :heavy_check_mark:
AWS Kinesis :heavy_check_mark: :heavy_check_mark:
Alibaba Cloud OSS :heavy_check_mark:
Google BigTable/BigQuery to be added
Elasticsearch (experimental) :heavy_check_mark: :heavy_check_mark:
MongoDB (experimental) :heavy_check_mark: :heavy_check_mark:

References for emulators:

Community

Additional Information

License

Apache License 2.0

Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.25.0.dev20220418163613-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.25.0.dev20220418163613-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 0b7424da887a7d6c68d7ea6defffbba9f0f053d9d234fbb7bb265da164c7e2fa
MD5 2b51d803e38c15e52ae3bebdd862eb89
BLAKE2b-256 e6849d1014314742d1b7d6cb57004f607ded8298ea7c9b0d75adf179e6f1791a

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.25.0.dev20220418163613-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.25.0.dev20220418163613-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 226f0fb69b2942d17d89ecd3dbbec17f83bece5a701907e25bac984fa2df4c2b
MD5 ed692c8c45aba9c0427db28b9ddeb721
BLAKE2b-256 2610cccdadd55712a5b5d0c408ab4f4910717df95df9bcf7d89a9d9e7f64c0c8

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.25.0.dev20220418163613-cp310-cp310-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.25.0.dev20220418163613-cp310-cp310-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 2ac73a47b9b7260f2f3514f2aa3a378b8bb29ae0c3d57ca9b1e2e891c5d9b7c7
MD5 8ba397895477e12b7eb1d7fce8f16788
BLAKE2b-256 853f49780166c8785780157c73ef93c8e74c9d5125aae245443999acba30335e

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.25.0.dev20220418163613-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.25.0.dev20220418163613-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 4ed8b8a26b749c181b6c8bf5517f4c5386130afacd3051cc7c87c1782141bd9f
MD5 1b6e4317fe61fc0836de0bfd419e4641
BLAKE2b-256 0a3794a4bdaea91e8064e48f23b614189a2e70563169b19836cec63f1a573e2e

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.25.0.dev20220418163613-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.25.0.dev20220418163613-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 69746f07906d47e87ea0cc53a5ed8ccd4a790b21804d8ff196d1dcb8c6c29a4c
MD5 b73a9b2e7feb84165b3c977979611f22
BLAKE2b-256 6284ca086a8e98eae497d8d9a28307c36dc94b737733f3f3cd49e230e9daed69

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.25.0.dev20220418163613-cp39-cp39-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.25.0.dev20220418163613-cp39-cp39-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 e112072b9c9b22205094aaa3481cbd1a6e92d573ce9eedb77fa68062bb2d0775
MD5 adf165bed82a7d6eed1b7c4be280c83b
BLAKE2b-256 84afd85b69692137a990400586406e8e3dcce9b06a8d1825b466d20e817c3fcf

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.25.0.dev20220418163613-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.25.0.dev20220418163613-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 bb160ef4ad2ceebadc477595532d9dd087ce53acf8f61ba269ce6bb46e17bc90
MD5 5d687f783a7f9c465132c1fabdc893d1
BLAKE2b-256 c105709b016fb4a5ecf19ce0bc7ab4cbdb5fe494712c14981e10ed1541e2edcb

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.25.0.dev20220418163613-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.25.0.dev20220418163613-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 ea1ce018868d7fa9c8d8351f2850b267eb40c0510389fae8ca46ff9cdd068826
MD5 2e7fb2230046a291ed887a22134dd1b2
BLAKE2b-256 af5073fbe91e102bf8c4a1a1bcc90ed7f296e2a5e4cdb756f9b53e09e6820523

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.25.0.dev20220418163613-cp38-cp38-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.25.0.dev20220418163613-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 1621cfbd3adfe480074573641e35d5c1b6d1f4500e9e74e8b0f1b2cfa9c2007e
MD5 d0db7de1bef764786d25c6b28d95fc33
BLAKE2b-256 fac43e0f41b9925d1d4b6944136626ffbfb71e20c98cdb94826f5e51fe603e44

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.25.0.dev20220418163613-cp37-cp37m-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.25.0.dev20220418163613-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 25500a93bc881a97296d2d172bfc0d98c67c7a381b0a002feaf9de33423636b4
MD5 e980dd052dbd21389c9430eb5be0beee
BLAKE2b-256 09273064dbdf2600d4e2e38d6bec650cf12c3d8307ba63a5cf98fb493382908b

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.25.0.dev20220418163613-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.25.0.dev20220418163613-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 5526c738b45c1266e1be39cd6862764da74b231cbf9256cd2881511edbca4e36
MD5 50f5e3282c239b97643cc98f343b5263
BLAKE2b-256 41b2e5f6d68a6b14a488c509fad7606f41840ed4036ad78a7ee5c566b36ad9ba

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.25.0.dev20220418163613-cp37-cp37m-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.25.0.dev20220418163613-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 c8906112e3e16ec16ba5dc744da16f853619824651efd12e8a9c24de76a09760
MD5 c4f706626fe151e226adea2a31833538
BLAKE2b-256 e84b50cdd4a97624ce62385bb3353d615dcefa4327c8c34e84e051a6f68fd9b7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page