Skip to main content

TensorFlow IO

Project description




TensorFlow I/O

GitHub CI PyPI License Documentation

TensorFlow I/O is a collection of file systems and file formats that are not available in TensorFlow's built-in support. A full list of supported file systems and file formats by TensorFlow I/O can be found here.

The use of tensorflow-io is straightforward with keras. Below is an example to Get Started with TensorFlow with the data processing aspect replaced by tensorflow-io:

import tensorflow as tf
import tensorflow_io as tfio

# Read the MNIST data into the IODataset.
dataset_url = "https://storage.googleapis.com/cvdf-datasets/mnist/"
d_train = tfio.IODataset.from_mnist(
    dataset_url + "train-images-idx3-ubyte.gz",
    dataset_url + "train-labels-idx1-ubyte.gz",
)

# Shuffle the elements of the dataset.
d_train = d_train.shuffle(buffer_size=1024)

# By default image data is uint8, so convert to float32 using map().
d_train = d_train.map(lambda x, y: (tf.image.convert_image_dtype(x, tf.float32), y))

# prepare batches the data just like any other tf.data.Dataset
d_train = d_train.batch(32)

# Build the model.
model = tf.keras.models.Sequential(
    [
        tf.keras.layers.Flatten(input_shape=(28, 28)),
        tf.keras.layers.Dense(512, activation=tf.nn.relu),
        tf.keras.layers.Dropout(0.2),
        tf.keras.layers.Dense(10, activation=tf.nn.softmax),
    ]
)

# Compile the model.
model.compile(
    optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"]
)

# Fit the model.
model.fit(d_train, epochs=5, steps_per_epoch=200)

In the above MNIST example, the URL's to access the dataset files are passed directly to the tfio.IODataset.from_mnist API call. This is due to the inherent support that tensorflow-io provides for HTTP/HTTPS file system, thus eliminating the need for downloading and saving datasets on a local directory.

NOTE: Since tensorflow-io is able to detect and uncompress the MNIST dataset automatically if needed, we can pass the URL's for the compressed files (gzip) to the API call as is.

Please check the official documentation for more detailed and interesting usages of the package.

Installation

Python Package

The tensorflow-io Python package can be installed with pip directly using:

$ pip install tensorflow-io

People who are a little more adventurous can also try our nightly binaries:

$ pip install tensorflow-io-nightly

To ensure you have a version of TensorFlow that is compatible with TensorFlow-IO, you can specify the tensorflow extra requirement during install:

pip install tensorflow-io[tensorflow]

Similar extras exist for the tensorflow-gpu, tensorflow-cpu and tensorflow-rocm packages.

Docker Images

In addition to the pip packages, the docker images can be used to quickly get started.

For stable builds:

$ docker pull tfsigio/tfio:latest
$ docker run -it --rm --name tfio-latest tfsigio/tfio:latest

For nightly builds:

$ docker pull tfsigio/tfio:nightly
$ docker run -it --rm --name tfio-nightly tfsigio/tfio:nightly

R Package

Once the tensorflow-io Python package has been successfully installed, you can install the development version of the R package from GitHub via the following:

if (!require("remotes")) install.packages("remotes")
remotes::install_github("tensorflow/io", subdir = "R-package")

TensorFlow Version Compatibility

To ensure compatibility with TensorFlow, it is recommended to install a matching version of TensorFlow I/O according to the table below. You can find the list of releases here.

TensorFlow I/O Version TensorFlow Compatibility Release Date
0.22.0 2.7.x Nov 10, 2021
0.21.0 2.6.x Sep 12, 2021
0.20.0 2.6.x Aug 11, 2021
0.19.1 2.5.x Jul 25, 2021
0.19.0 2.5.x Jun 25, 2021
0.18.0 2.5.x May 13, 2021
0.17.1 2.4.x Apr 16, 2021
0.17.0 2.4.x Dec 14, 2020
0.16.0 2.3.x Oct 23, 2020
0.15.0 2.3.x Aug 03, 2020
0.14.0 2.2.x Jul 08, 2020
0.13.0 2.2.x May 10, 2020
0.12.0 2.1.x Feb 28, 2020
0.11.0 2.1.x Jan 10, 2020
0.10.0 2.0.x Dec 05, 2019
0.9.1 2.0.x Nov 15, 2019
0.9.0 2.0.x Oct 18, 2019
0.8.1 1.15.x Nov 15, 2019
0.8.0 1.15.x Oct 17, 2019
0.7.2 1.14.x Nov 15, 2019
0.7.1 1.14.x Oct 18, 2019
0.7.0 1.14.x Jul 14, 2019
0.6.0 1.13.x May 29, 2019
0.5.0 1.13.x Apr 12, 2019
0.4.0 1.13.x Mar 01, 2019
0.3.0 1.12.0 Feb 15, 2019
0.2.0 1.12.0 Jan 29, 2019
0.1.0 1.12.0 Dec 16, 2018

Performance Benchmarking

We use github-pages to document the results of API performance benchmarks. The benchmark job is triggered on every commit to master branch and facilitates tracking performance w.r.t commits.

Contributing

Tensorflow I/O is a community led open source project. As such, the project depends on public contributions, bug-fixes, and documentation. Please see:

Build Status and CI

Build Status
Linux CPU Python 2 Status
Linux CPU Python 3 Status
Linux GPU Python 2 Status
Linux GPU Python 3 Status

Because of manylinux2010 requirement, TensorFlow I/O is built with Ubuntu:16.04 + Developer Toolset 7 (GCC 7.3) on Linux. Configuration with Ubuntu 16.04 with Developer Toolset 7 is not exactly straightforward. If the system have docker installed, then the following command will automatically build manylinux2010 compatible whl package:

#!/usr/bin/env bash

ls dist/*
for f in dist/*.whl; do
  docker run -i --rm -v $PWD:/v -w /v --net=host quay.io/pypa/manylinux2010_x86_64 bash -x -e /v/tools/build/auditwheel repair --plat manylinux2010_x86_64 $f
done
sudo chown -R $(id -nu):$(id -ng) .
ls wheelhouse/*

It takes some time to build, but once complete, there will be python 3.5, 3.6, 3.7 compatible whl packages available in wheelhouse directory.

On macOS, the same command could be used. However, the script expects python in shell and will only generate a whl package that matches the version of python in shell. If you want to build a whl package for a specific python then you have to alias this version of python to python in shell. See .github/workflows/build.yml Auditwheel step for instructions how to do that.

Note the above command is also the command we use when releasing packages for Linux and macOS.

TensorFlow I/O uses both GitHub Workflows and Google CI (Kokoro) for continuous integration. GitHub Workflows is used for macOS build and test. Kokoro is used for Linux build and test. Again, because of the manylinux2010 requirement, on Linux whl packages are always built with Ubuntu 16.04 + Developer Toolset 7. Tests are done on a variatiy of systems with different python3 versions to ensure a good coverage:

Python Ubuntu 18.04 Ubuntu 20.04 macOS + osx9 Windows-2019
2.7 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: N/A
3.7 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:
3.8 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:

TensorFlow I/O has integrations with many systems and cloud vendors such as Prometheus, Apache Kafka, Apache Ignite, Google Cloud PubSub, AWS Kinesis, Microsoft Azure Storage, Alibaba Cloud OSS etc.

We tried our best to test against those systems in our continuous integration whenever possible. Some tests such as Prometheus, Kafka, and Ignite are done with live systems, meaning we install Prometheus/Kafka/Ignite on CI machine before the test is run. Some tests such as Kinesis, PubSub, and Azure Storage are done through official or non-official emulators. Offline tests are also performed whenever possible, though systems covered through offine tests may not have the same level of coverage as live systems or emulators.

Live System Emulator CI Integration Offline
Apache Kafka :heavy_check_mark: :heavy_check_mark:
Apache Ignite :heavy_check_mark: :heavy_check_mark:
Prometheus :heavy_check_mark: :heavy_check_mark:
Google PubSub :heavy_check_mark: :heavy_check_mark:
Azure Storage :heavy_check_mark: :heavy_check_mark:
AWS Kinesis :heavy_check_mark: :heavy_check_mark:
Alibaba Cloud OSS :heavy_check_mark:
Google BigTable/BigQuery to be added
Elasticsearch (experimental) :heavy_check_mark: :heavy_check_mark:
MongoDB (experimental) :heavy_check_mark: :heavy_check_mark:

References for emulators:

Community

Additional Information

License

Apache License 2.0

Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.22.0.dev20211114023136-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.22.0.dev20211114023136-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 102ed5e71abbe69ddc3c0a1be7b3ea5f0b32728148bb4e6e28e16b2d816e2470
MD5 6a9aca2d5d75070c4d7ec665d3738f4e
BLAKE2b-256 2da685a92b8e0d5169db22436d0101045517dc475754e8ed47c939c061742139

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.22.0.dev20211114023136-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.22.0.dev20211114023136-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 fce13dd96d46788c35592839783f4d397285030e71122020046550380e643972
MD5 921bb6368da5653583bfb15f662ac782
BLAKE2b-256 4e13c75225f7a4a01f2000d23e59adb85068d1c9a29824029d92d9321b43ba07

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.22.0.dev20211114023136-cp39-cp39-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.22.0.dev20211114023136-cp39-cp39-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 035facb5f59d44d125f69c67432a3abf81bfad31d294c611375f78f87bcaee87
MD5 71e61e04ae5080aa89fc6a67c783fdfe
BLAKE2b-256 928691364346a2176f95ad1fe92e62018c7a2ab43b620f026b8f82377f905ec4

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.22.0.dev20211114023136-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.22.0.dev20211114023136-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 6bd55007887ac8d1ee20e145d218be4aab720c8e94f5eacfff04e09ac3b4ce92
MD5 1a221c890df6b9360815fff39095e35f
BLAKE2b-256 62582828aedf44087737b3a39bd505bbcbbfe4028d281e3480decf0f4fa622f1

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.22.0.dev20211114023136-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.22.0.dev20211114023136-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 c11d844c20199912e6e4144b9d44afdea29c4373cd8277843342907c6353fd1b
MD5 da41fbc99bc4b2fbfeeea68adce9a45d
BLAKE2b-256 467e3e76102bec7e04912a6cae2c6c7f9e4db3ab11b8386dc656c141664b51bc

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.22.0.dev20211114023136-cp38-cp38-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.22.0.dev20211114023136-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 3a46a666250cd03db759f99c23ebac27c2f1622149df41a541dfaacdb39cd05a
MD5 7f3d4addb5f6f12fa73877a22c23edde
BLAKE2b-256 c4e57459f32e620fa3945a9fae322c2003bc950a980078ab828f1e9db61e5242

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.22.0.dev20211114023136-cp37-cp37m-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.22.0.dev20211114023136-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 9a8103821c225291912b928cf7d5eec7506f6298da26fefbacb427d28deb518a
MD5 6212be46a9d1e9e477d07e4d8357bff2
BLAKE2b-256 7ea0e1fcaeb2c20e7d4d14fb3f77f004f35c8148730bcaf51960461e410db9c9

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.22.0.dev20211114023136-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.22.0.dev20211114023136-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 df32167628193ff94bee0f6b3eaf482f1eb1e7b43ebb4007a30ee9d34d59f115
MD5 ca746dc08432cb58b438bf14d833fed8
BLAKE2b-256 4ba0a0841a02371782b4d4833ef80f7a0fd272c5769d9e0d40f110b16831ec7c

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.22.0.dev20211114023136-cp37-cp37m-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.22.0.dev20211114023136-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 3ac295c8ab38647d43d6653c163b9bb4b3383904048de061c7add5c5f303c95d
MD5 474b91655efbf9217e34cc3176f5acac
BLAKE2b-256 a0fcc46ab36fb462849dec0bb0ee77c1394b225aac3795b8165b5847b54bfce0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page