Skip to main content

TensorFlow IO

Project description




TensorFlow I/O

GitHub CI PyPI License Documentation

TensorFlow I/O is a collection of file systems and file formats that are not available in TensorFlow's built-in support. A full list of supported file systems and file formats by TensorFlow I/O can be found here.

The use of tensorflow-io is straightforward with keras. Below is an example to Get Started with TensorFlow with the data processing aspect replaced by tensorflow-io:

import tensorflow as tf
import tensorflow_io as tfio

# Read the MNIST data into the IODataset.
dataset_url = "https://storage.googleapis.com/cvdf-datasets/mnist/"
d_train = tfio.IODataset.from_mnist(
    dataset_url + "train-images-idx3-ubyte.gz",
    dataset_url + "train-labels-idx1-ubyte.gz",
)

# Shuffle the elements of the dataset.
d_train = d_train.shuffle(buffer_size=1024)

# By default image data is uint8, so convert to float32 using map().
d_train = d_train.map(lambda x, y: (tf.image.convert_image_dtype(x, tf.float32), y))

# prepare batches the data just like any other tf.data.Dataset
d_train = d_train.batch(32)

# Build the model.
model = tf.keras.models.Sequential(
    [
        tf.keras.layers.Flatten(input_shape=(28, 28)),
        tf.keras.layers.Dense(512, activation=tf.nn.relu),
        tf.keras.layers.Dropout(0.2),
        tf.keras.layers.Dense(10, activation=tf.nn.softmax),
    ]
)

# Compile the model.
model.compile(
    optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"]
)

# Fit the model.
model.fit(d_train, epochs=5, steps_per_epoch=200)

In the above MNIST example, the URL's to access the dataset files are passed directly to the tfio.IODataset.from_mnist API call. This is due to the inherent support that tensorflow-io provides for HTTP/HTTPS file system, thus eliminating the need for downloading and saving datasets on a local directory.

NOTE: Since tensorflow-io is able to detect and uncompress the MNIST dataset automatically if needed, we can pass the URL's for the compressed files (gzip) to the API call as is.

Please check the official documentation for more detailed and interesting usages of the package.

Installation

Python Package

The tensorflow-io Python package can be installed with pip directly using:

$ pip install tensorflow-io

People who are a little more adventurous can also try our nightly binaries:

$ pip install tensorflow-io-nightly

Docker Images

In addition to the pip packages, the docker images can be used to quickly get started.

For stable builds:

$ docker pull tfsigio/tfio:latest
$ docker run -it --rm --name tfio-latest tfsigio/tfio:latest

For nightly builds:

$ docker pull tfsigio/tfio:nightly
$ docker run -it --rm --name tfio-nightly tfsigio/tfio:nightly

R Package

Once the tensorflow-io Python package has been successfully installed, you can install the development version of the R package from GitHub via the following:

if (!require("remotes")) install.packages("remotes")
remotes::install_github("tensorflow/io", subdir = "R-package")

TensorFlow Version Compatibility

To ensure compatibility with TensorFlow, it is recommended to install a matching version of TensorFlow I/O according to the table below. You can find the list of releases here.

TensorFlow I/O Version TensorFlow Compatibility Release Date
0.21.0 2.6.x Sep 12, 2021
0.20.0 2.6.x Aug 11, 2021
0.19.1 2.5.x Jul 25, 2021
0.19.0 2.5.x Jun 25, 2021
0.18.0 2.5.x May 13, 2021
0.17.1 2.4.x Apr 16, 2021
0.17.0 2.4.x Dec 14, 2020
0.16.0 2.3.x Oct 23, 2020
0.15.0 2.3.x Aug 03, 2020
0.14.0 2.2.x Jul 08, 2020
0.13.0 2.2.x May 10, 2020
0.12.0 2.1.x Feb 28, 2020
0.11.0 2.1.x Jan 10, 2020
0.10.0 2.0.x Dec 05, 2019
0.9.1 2.0.x Nov 15, 2019
0.9.0 2.0.x Oct 18, 2019
0.8.1 1.15.x Nov 15, 2019
0.8.0 1.15.x Oct 17, 2019
0.7.2 1.14.x Nov 15, 2019
0.7.1 1.14.x Oct 18, 2019
0.7.0 1.14.x Jul 14, 2019
0.6.0 1.13.x May 29, 2019
0.5.0 1.13.x Apr 12, 2019
0.4.0 1.13.x Mar 01, 2019
0.3.0 1.12.0 Feb 15, 2019
0.2.0 1.12.0 Jan 29, 2019
0.1.0 1.12.0 Dec 16, 2018

Performance Benchmarking

We use github-pages to document the results of API performance benchmarks. The benchmark job is triggered on every commit to master branch and facilitates tracking performance w.r.t commits.

Contributing

Tensorflow I/O is a community led open source project. As such, the project depends on public contributions, bug-fixes, and documentation. Please see:

Build Status and CI

Build Status
Linux CPU Python 2 Status
Linux CPU Python 3 Status
Linux GPU Python 2 Status
Linux GPU Python 3 Status

Because of manylinux2010 requirement, TensorFlow I/O is built with Ubuntu:16.04 + Developer Toolset 7 (GCC 7.3) on Linux. Configuration with Ubuntu 16.04 with Developer Toolset 7 is not exactly straightforward. If the system have docker installed, then the following command will automatically build manylinux2010 compatible whl package:

#!/usr/bin/env bash

ls dist/*
for f in dist/*.whl; do
  docker run -i --rm -v $PWD:/v -w /v --net=host quay.io/pypa/manylinux2010_x86_64 bash -x -e /v/tools/build/auditwheel repair --plat manylinux2010_x86_64 $f
done
sudo chown -R $(id -nu):$(id -ng) .
ls wheelhouse/*

It takes some time to build, but once complete, there will be python 3.5, 3.6, 3.7 compatible whl packages available in wheelhouse directory.

On macOS, the same command could be used. However, the script expects python in shell and will only generate a whl package that matches the version of python in shell. If you want to build a whl package for a specific python then you have to alias this version of python to python in shell. See .github/workflows/build.yml Auditwheel step for instructions how to do that.

Note the above command is also the command we use when releasing packages for Linux and macOS.

TensorFlow I/O uses both GitHub Workflows and Google CI (Kokoro) for continuous integration. GitHub Workflows is used for macOS build and test. Kokoro is used for Linux build and test. Again, because of the manylinux2010 requirement, on Linux whl packages are always built with Ubuntu 16.04 + Developer Toolset 7. Tests are done on a variatiy of systems with different python3 versions to ensure a good coverage:

Python Ubuntu 18.04 Ubuntu 20.04 macOS + osx9 Windows-2019
2.7 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: N/A
3.7 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:
3.8 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:

TensorFlow I/O has integrations with many systems and cloud vendors such as Prometheus, Apache Kafka, Apache Ignite, Google Cloud PubSub, AWS Kinesis, Microsoft Azure Storage, Alibaba Cloud OSS etc.

We tried our best to test against those systems in our continuous integration whenever possible. Some tests such as Prometheus, Kafka, and Ignite are done with live systems, meaning we install Prometheus/Kafka/Ignite on CI machine before the test is run. Some tests such as Kinesis, PubSub, and Azure Storage are done through official or non-official emulators. Offline tests are also performed whenever possible, though systems covered through offine tests may not have the same level of coverage as live systems or emulators.

Live System Emulator CI Integration Offline
Apache Kafka :heavy_check_mark: :heavy_check_mark:
Apache Ignite :heavy_check_mark: :heavy_check_mark:
Prometheus :heavy_check_mark: :heavy_check_mark:
Google PubSub :heavy_check_mark: :heavy_check_mark:
Azure Storage :heavy_check_mark: :heavy_check_mark:
AWS Kinesis :heavy_check_mark: :heavy_check_mark:
Alibaba Cloud OSS :heavy_check_mark:
Google BigTable/BigQuery to be added
Elasticsearch (experimental) :heavy_check_mark: :heavy_check_mark:
MongoDB (experimental) :heavy_check_mark: :heavy_check_mark:

References for emulators:

Community

Additional Information

License

Apache License 2.0

Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20210912234051-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20210912234051-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 e4cf2cb4cfe94a191ad96fd6d0f4ba9a74572c20f217de0cec803dfc2bb6c4c8
MD5 6d692ba46af4af25ac9bbd2bb8d7380a
BLAKE2b-256 b10aeebde9bcd1f338ac13de577e0f0ac9f3b5f1fd9577b6e87d40e5d16fb1f6

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20210912234051-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20210912234051-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 0b85e589f901d7c51f68fe5c6ecae0f13c237133b4054f8574b2c8ea17072b23
MD5 ca5588d09b31e78c8aa3154c9d24160d
BLAKE2b-256 47104182e7508958288b8e55b6c5b52a52ecefd7f9c808ecae3e7cf408579f3d

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20210912234051-cp39-cp39-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20210912234051-cp39-cp39-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 460995c6a453a623c715fcdb1d586f8a3800a0a6fc9e2ed02029512ca0f5385f
MD5 e18b3bee9660c4d2be509230a464f6cd
BLAKE2b-256 ab9dc7370a91a6b558f88c2d30546df3ede09c070c37bcda39af991f48887ab7

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20210912234051-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20210912234051-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 d10086cfd51d9e71e0309ed9f620768a5442dcbda13e170f5fc81ee40fcf8b8c
MD5 eaf85027ae34a2c076b48ab8dfec818a
BLAKE2b-256 1f4fe21d8bc43767d3b4ff026e27f429b8c3bf2c9e0cd227e807b2cdfd02c28e

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20210912234051-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20210912234051-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 0c6407e21b7e370a78312f3fb654d3c4b60b6c1d21aa32c215a83485eb2f650b
MD5 f4ccdbe0e3c2277c4c38d6fcfbeb13b2
BLAKE2b-256 f2ec4b36e7543187b18c2591957c8cf1935354b98891f3a59ef6838d97eb6c80

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20210912234051-cp38-cp38-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20210912234051-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 b0e02fc58a1e0f2ac62fb98d48a7de0c50f1f503ac4fc4b82f2a04d062c4f6c2
MD5 4eba9ac9c7a27242cd1aa7d26eced0e0
BLAKE2b-256 49fbf514b5ca6c7060d691934943569c7f38b85bf1ceb0453c920f50df822af4

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20210912234051-cp37-cp37m-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20210912234051-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 c7ccfc40f2686c37f994eb93f1454c5eb5c5ccd98f212eae9c26ef6a6457b541
MD5 b2ae81c317755f5532de03723142089c
BLAKE2b-256 3f3b9eec22224affd2dc7d57111646107d864e4062acedf588150b2d9b0c9f3c

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20210912234051-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20210912234051-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 bc9e986690902094345e65327db736fc9ce6d707ec0061aacad813a07b3a3708
MD5 a48d146fef03b30f700d31db7b07a22a
BLAKE2b-256 a17b11f3f2b8f5a94fa9dc42eb9ad793859de49b743019c07ec7cf671ce7a20c

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20210912234051-cp37-cp37m-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20210912234051-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 eaaed1047f2a84ff5e21b782a6e12c143c9bd73f60bcf8c08637d8ec240a843f
MD5 208c10fd5f200d9053d000494c2b633e
BLAKE2b-256 a8ad99e6d2775971ef962ca12f9f3e77ebfd7ae9940b0e97e19e7ea616c9ed2b

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20210912234051-cp36-cp36m-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20210912234051-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 9ecd98c11e78b01250bb108d4d7bd3e9b49fb1eaf43ca71f0713ed64708c6fcc
MD5 bdc0b259ecc23464177386ba89dcf8c0
BLAKE2b-256 5974e51a758c4fa615169204c464bfd1c2cee7289377926a233c5cddd0c29792

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20210912234051-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20210912234051-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 efefc0201fb0d51f7cd27770f1876f04622a795ce0ca185e16560b26ab8af2e5
MD5 24f6bcb13ed7c1f324b7bc306317fac4
BLAKE2b-256 69348acd30097cda15e6a6b4e2b074e2fc80e6ef7e5e244a29eabd9d1481d21c

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20210912234051-cp36-cp36m-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20210912234051-cp36-cp36m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 dc858582e699474602bcbeb750fdf51d79c2d1f41c182f3a34f39a4d06ad4bd2
MD5 c625bb649fe8fac9be1e34cfd3c143b0
BLAKE2b-256 e9c414090860cfab511dc83fc594506bb2282d83639591896838e20506f8d492

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page