Skip to main content

TensorFlow IO

Project description




TensorFlow I/O

GitHub CI PyPI License Documentation

TensorFlow I/O is a collection of file systems and file formats that are not available in TensorFlow's built-in support. A full list of supported file systems and file formats by TensorFlow I/O can be found here.

The use of tensorflow-io is straightforward with keras. Below is an example to Get Started with TensorFlow with the data processing aspect replaced by tensorflow-io:

import tensorflow as tf
import tensorflow_io as tfio

# Read the MNIST data into the IODataset.
dataset_url = "https://storage.googleapis.com/cvdf-datasets/mnist/"
d_train = tfio.IODataset.from_mnist(
    dataset_url + "train-images-idx3-ubyte.gz",
    dataset_url + "train-labels-idx1-ubyte.gz",
)

# Shuffle the elements of the dataset.
d_train = d_train.shuffle(buffer_size=1024)

# By default image data is uint8, so convert to float32 using map().
d_train = d_train.map(lambda x, y: (tf.image.convert_image_dtype(x, tf.float32), y))

# prepare batches the data just like any other tf.data.Dataset
d_train = d_train.batch(32)

# Build the model.
model = tf.keras.models.Sequential(
    [
        tf.keras.layers.Flatten(input_shape=(28, 28)),
        tf.keras.layers.Dense(512, activation=tf.nn.relu),
        tf.keras.layers.Dropout(0.2),
        tf.keras.layers.Dense(10, activation=tf.nn.softmax),
    ]
)

# Compile the model.
model.compile(
    optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"]
)

# Fit the model.
model.fit(d_train, epochs=5, steps_per_epoch=200)

In the above MNIST example, the URL's to access the dataset files are passed directly to the tfio.IODataset.from_mnist API call. This is due to the inherent support that tensorflow-io provides for HTTP/HTTPS file system, thus eliminating the need for downloading and saving datasets on a local directory.

NOTE: Since tensorflow-io is able to detect and uncompress the MNIST dataset automatically if needed, we can pass the URL's for the compressed files (gzip) to the API call as is.

Please check the official documentation for more detailed and interesting usages of the package.

Installation

Python Package

The tensorflow-io Python package can be installed with pip directly using:

$ pip install tensorflow-io

People who are a little more adventurous can also try our nightly binaries:

$ pip install tensorflow-io-nightly

Docker Images

In addition to the pip packages, the docker images can be used to quickly get started.

For stable builds:

$ docker pull tfsigio/tfio:latest
$ docker run -it --rm --name tfio-latest tfsigio/tfio:latest

For nightly builds:

$ docker pull tfsigio/tfio:nightly
$ docker run -it --rm --name tfio-nightly tfsigio/tfio:nightly

R Package

Once the tensorflow-io Python package has been successfully installed, you can install the development version of the R package from GitHub via the following:

if (!require("remotes")) install.packages("remotes")
remotes::install_github("tensorflow/io", subdir = "R-package")

TensorFlow Version Compatibility

To ensure compatibility with TensorFlow, it is recommended to install a matching version of TensorFlow I/O according to the table below. You can find the list of releases here.

TensorFlow I/O Version TensorFlow Compatibility Release Date
0.21.0 2.6.x Sep 12, 2021
0.20.0 2.6.x Aug 11, 2021
0.19.1 2.5.x Jul 25, 2021
0.19.0 2.5.x Jun 25, 2021
0.18.0 2.5.x May 13, 2021
0.17.1 2.4.x Apr 16, 2021
0.17.0 2.4.x Dec 14, 2020
0.16.0 2.3.x Oct 23, 2020
0.15.0 2.3.x Aug 03, 2020
0.14.0 2.2.x Jul 08, 2020
0.13.0 2.2.x May 10, 2020
0.12.0 2.1.x Feb 28, 2020
0.11.0 2.1.x Jan 10, 2020
0.10.0 2.0.x Dec 05, 2019
0.9.1 2.0.x Nov 15, 2019
0.9.0 2.0.x Oct 18, 2019
0.8.1 1.15.x Nov 15, 2019
0.8.0 1.15.x Oct 17, 2019
0.7.2 1.14.x Nov 15, 2019
0.7.1 1.14.x Oct 18, 2019
0.7.0 1.14.x Jul 14, 2019
0.6.0 1.13.x May 29, 2019
0.5.0 1.13.x Apr 12, 2019
0.4.0 1.13.x Mar 01, 2019
0.3.0 1.12.0 Feb 15, 2019
0.2.0 1.12.0 Jan 29, 2019
0.1.0 1.12.0 Dec 16, 2018

Performance Benchmarking

We use github-pages to document the results of API performance benchmarks. The benchmark job is triggered on every commit to master branch and facilitates tracking performance w.r.t commits.

Contributing

Tensorflow I/O is a community led open source project. As such, the project depends on public contributions, bug-fixes, and documentation. Please see:

Build Status and CI

Build Status
Linux CPU Python 2 Status
Linux CPU Python 3 Status
Linux GPU Python 2 Status
Linux GPU Python 3 Status

Because of manylinux2010 requirement, TensorFlow I/O is built with Ubuntu:16.04 + Developer Toolset 7 (GCC 7.3) on Linux. Configuration with Ubuntu 16.04 with Developer Toolset 7 is not exactly straightforward. If the system have docker installed, then the following command will automatically build manylinux2010 compatible whl package:

#!/usr/bin/env bash

ls dist/*
for f in dist/*.whl; do
  docker run -i --rm -v $PWD:/v -w /v --net=host quay.io/pypa/manylinux2010_x86_64 bash -x -e /v/tools/build/auditwheel repair --plat manylinux2010_x86_64 $f
done
sudo chown -R $(id -nu):$(id -ng) .
ls wheelhouse/*

It takes some time to build, but once complete, there will be python 3.5, 3.6, 3.7 compatible whl packages available in wheelhouse directory.

On macOS, the same command could be used. However, the script expects python in shell and will only generate a whl package that matches the version of python in shell. If you want to build a whl package for a specific python then you have to alias this version of python to python in shell. See .github/workflows/build.yml Auditwheel step for instructions how to do that.

Note the above command is also the command we use when releasing packages for Linux and macOS.

TensorFlow I/O uses both GitHub Workflows and Google CI (Kokoro) for continuous integration. GitHub Workflows is used for macOS build and test. Kokoro is used for Linux build and test. Again, because of the manylinux2010 requirement, on Linux whl packages are always built with Ubuntu 16.04 + Developer Toolset 7. Tests are done on a variatiy of systems with different python3 versions to ensure a good coverage:

Python Ubuntu 18.04 Ubuntu 20.04 macOS + osx9 Windows-2019
2.7 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: N/A
3.7 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:
3.8 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:

TensorFlow I/O has integrations with many systems and cloud vendors such as Prometheus, Apache Kafka, Apache Ignite, Google Cloud PubSub, AWS Kinesis, Microsoft Azure Storage, Alibaba Cloud OSS etc.

We tried our best to test against those systems in our continuous integration whenever possible. Some tests such as Prometheus, Kafka, and Ignite are done with live systems, meaning we install Prometheus/Kafka/Ignite on CI machine before the test is run. Some tests such as Kinesis, PubSub, and Azure Storage are done through official or non-official emulators. Offline tests are also performed whenever possible, though systems covered through offine tests may not have the same level of coverage as live systems or emulators.

Live System Emulator CI Integration Offline
Apache Kafka :heavy_check_mark: :heavy_check_mark:
Apache Ignite :heavy_check_mark: :heavy_check_mark:
Prometheus :heavy_check_mark: :heavy_check_mark:
Google PubSub :heavy_check_mark: :heavy_check_mark:
Azure Storage :heavy_check_mark: :heavy_check_mark:
AWS Kinesis :heavy_check_mark: :heavy_check_mark:
Alibaba Cloud OSS :heavy_check_mark:
Google BigTable/BigQuery to be added
Elasticsearch (experimental) :heavy_check_mark: :heavy_check_mark:
MongoDB (experimental) :heavy_check_mark: :heavy_check_mark:

References for emulators:

Community

Additional Information

License

Apache License 2.0

Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20211012172427-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20211012172427-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 b7b9867926d179ea9f8b4037b154694bd15b47c51ffff6dfaa20712770974d05
MD5 7c8a6e3d2fe0b84cc8a241c0c0152604
BLAKE2b-256 2e9edef8c72fc5454eb45d0d7a7b7939f23bd22ca096681f2d349a0dc1477410

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20211012172427-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20211012172427-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 389b651367f3f2c596b2fa689b037068cabb1770d505a4947287b6566c72abfd
MD5 d5f8665f5df1a5b94b378d6a19fad9fd
BLAKE2b-256 1ddff3ce643af2f69c469df69e3e1cac315de0d616445c4080eafc9e810807cf

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20211012172427-cp39-cp39-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20211012172427-cp39-cp39-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 1a74539ea6c6a1cd3a1ace5e100c8d685dc2fcec554ce7c09159a3637090e89f
MD5 30a8fb5e7dc99f542fd8f3432b655a5f
BLAKE2b-256 b3bebea92494eca30cd2035f5af4d9c3bcd2751dcc6209889454d27eb64667ca

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20211012172427-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20211012172427-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 db19ba2e77fdc988df742099a73343e21100492f8d5441df6e0f6834d11c9a41
MD5 27ac5f7c1d1a6d4d482d6045f4d50838
BLAKE2b-256 20963e7ae0ef09d99fd06f8ffdbbb1043837457cea1c1572bb62c8b08ff6e55b

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20211012172427-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20211012172427-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 96535625512b56adbfbc7ab07f6b080648dba9e527bf91cee9c4edbbdd01cf29
MD5 ab242081338dc428054820e160c4dc5f
BLAKE2b-256 6e65ad6af0177e6ce3964f4e7d01702e8070a21f23d71d5f0bdfbcdcca8d2d43

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20211012172427-cp38-cp38-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20211012172427-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 2a66f433ada6c1be8b590daa2fa8c402d8492f0ed3868a85663742945f51d322
MD5 71f03d7ed3ac3282a01f86d8150c03e2
BLAKE2b-256 2652704b5f71ac43cfc7c67f1e12899d6cabde82ddbb5017e91e9d7f9afa1f21

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20211012172427-cp37-cp37m-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20211012172427-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 0f6192258a2342f49c67abf82404714bd62d96ceb24ba65d89c42f4b57cf9941
MD5 fc240002b578af0c2b042646bfecd0fb
BLAKE2b-256 1b96d04acc28776b15c9fa65f3212a2f099c5835a5a79a8cc248807bcb424b77

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20211012172427-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20211012172427-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 77eb9d467382d36e05943b4352b9116ba1a15e3b67579411a092f1e9187772bb
MD5 f0161d94ad3617ff8899ecd7735fe575
BLAKE2b-256 41e5bfffe5e22a764020566cdb95481fb88f4792dddc73a8901c454869d06b17

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20211012172427-cp37-cp37m-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20211012172427-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 26328941bbd7bc705dfb7325650b391ad7b062acb726fe897340788289ead872
MD5 138e543cce1e9d942e12d31a7a6a0afb
BLAKE2b-256 1a2d39b10b79704ed3e3b6454b4e00a9edf705137f554b1981d8c87d8349866e

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20211012172427-cp36-cp36m-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20211012172427-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 d8b5ecb9116c7168b8f939acd8bf1930044dfb7a97e68420984faf5538c752de
MD5 dc47624ed94b29049a3f71b2d7b348f4
BLAKE2b-256 667034e861fb2796dc0fd3194b3650ca43acdb0db995b0a22a0e11c444947d15

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20211012172427-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20211012172427-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 726a2552c5d4b47a4297d8a58666d904b40442b760fb2cd6d539f01653ca2a86
MD5 aab8e01e3edd4217729a76bd9c914fb1
BLAKE2b-256 6668396b8839d05c32aab8547d7bc3afa4313b800f2964b43a43235981d4aa6b

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20211012172427-cp36-cp36m-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20211012172427-cp36-cp36m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 933b30d94b29a06c8db24811eb9f3895d7b2e6c20b92de40d666027c2444f6b8
MD5 e351b316197bb332aa91eba13fee4f76
BLAKE2b-256 74c63eec6c468ed9eb31e1a5f44d83c398c5b1cf1b3350c20e5b9a1880b4a646

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page