Skip to main content

TensorFlow IO

Project description




TensorFlow I/O

GitHub CI PyPI License Documentation

TensorFlow I/O is a collection of file systems and file formats that are not available in TensorFlow's built-in support. A full list of supported file systems and file formats by TensorFlow I/O can be found here.

The use of tensorflow-io is straightforward with keras. Below is an example to Get Started with TensorFlow with the data processing aspect replaced by tensorflow-io:

import tensorflow as tf
import tensorflow_io as tfio

# Read the MNIST data into the IODataset.
dataset_url = "https://storage.googleapis.com/cvdf-datasets/mnist/"
d_train = tfio.IODataset.from_mnist(
    dataset_url + "train-images-idx3-ubyte.gz",
    dataset_url + "train-labels-idx1-ubyte.gz",
)

# Shuffle the elements of the dataset.
d_train = d_train.shuffle(buffer_size=1024)

# By default image data is uint8, so convert to float32 using map().
d_train = d_train.map(lambda x, y: (tf.image.convert_image_dtype(x, tf.float32), y))

# prepare batches the data just like any other tf.data.Dataset
d_train = d_train.batch(32)

# Build the model.
model = tf.keras.models.Sequential(
    [
        tf.keras.layers.Flatten(input_shape=(28, 28)),
        tf.keras.layers.Dense(512, activation=tf.nn.relu),
        tf.keras.layers.Dropout(0.2),
        tf.keras.layers.Dense(10, activation=tf.nn.softmax),
    ]
)

# Compile the model.
model.compile(
    optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"]
)

# Fit the model.
model.fit(d_train, epochs=5, steps_per_epoch=200)

In the above MNIST example, the URL's to access the dataset files are passed directly to the tfio.IODataset.from_mnist API call. This is due to the inherent support that tensorflow-io provides for HTTP/HTTPS file system, thus eliminating the need for downloading and saving datasets on a local directory.

NOTE: Since tensorflow-io is able to detect and uncompress the MNIST dataset automatically if needed, we can pass the URL's for the compressed files (gzip) to the API call as is.

Please check the official documentation for more detailed and interesting usages of the package.

Installation

Python Package

The tensorflow-io Python package can be installed with pip directly using:

$ pip install tensorflow-io

People who are a little more adventurous can also try our nightly binaries:

$ pip install tensorflow-io-nightly

To ensure you have a version of TensorFlow that is compatible with TensorFlow-IO, you can specify the tensorflow extra requirement during install:

pip install tensorflow-io[tensorflow]

Similar extras exist for the tensorflow-gpu, tensorflow-cpu and tensorflow-rocm packages.

Docker Images

In addition to the pip packages, the docker images can be used to quickly get started.

For stable builds:

$ docker pull tfsigio/tfio:latest
$ docker run -it --rm --name tfio-latest tfsigio/tfio:latest

For nightly builds:

$ docker pull tfsigio/tfio:nightly
$ docker run -it --rm --name tfio-nightly tfsigio/tfio:nightly

R Package

Once the tensorflow-io Python package has been successfully installed, you can install the development version of the R package from GitHub via the following:

if (!require("remotes")) install.packages("remotes")
remotes::install_github("tensorflow/io", subdir = "R-package")

TensorFlow Version Compatibility

To ensure compatibility with TensorFlow, it is recommended to install a matching version of TensorFlow I/O according to the table below. You can find the list of releases here.

TensorFlow I/O Version TensorFlow Compatibility Release Date
0.25.0 2.8.x Apr 19, 2022
0.24.0 2.8.x Feb 04, 2022
0.23.1 2.7.x Dec 15, 2021
0.23.0 2.7.x Dec 14, 2021
0.22.0 2.7.x Nov 10, 2021
0.21.0 2.6.x Sep 12, 2021
0.20.0 2.6.x Aug 11, 2021
0.19.1 2.5.x Jul 25, 2021
0.19.0 2.5.x Jun 25, 2021
0.18.0 2.5.x May 13, 2021
0.17.1 2.4.x Apr 16, 2021
0.17.0 2.4.x Dec 14, 2020
0.16.0 2.3.x Oct 23, 2020
0.15.0 2.3.x Aug 03, 2020
0.14.0 2.2.x Jul 08, 2020
0.13.0 2.2.x May 10, 2020
0.12.0 2.1.x Feb 28, 2020
0.11.0 2.1.x Jan 10, 2020
0.10.0 2.0.x Dec 05, 2019
0.9.1 2.0.x Nov 15, 2019
0.9.0 2.0.x Oct 18, 2019
0.8.1 1.15.x Nov 15, 2019
0.8.0 1.15.x Oct 17, 2019
0.7.2 1.14.x Nov 15, 2019
0.7.1 1.14.x Oct 18, 2019
0.7.0 1.14.x Jul 14, 2019
0.6.0 1.13.x May 29, 2019
0.5.0 1.13.x Apr 12, 2019
0.4.0 1.13.x Mar 01, 2019
0.3.0 1.12.0 Feb 15, 2019
0.2.0 1.12.0 Jan 29, 2019
0.1.0 1.12.0 Dec 16, 2018

Performance Benchmarking

We use github-pages to document the results of API performance benchmarks. The benchmark job is triggered on every commit to master branch and facilitates tracking performance w.r.t commits.

Contributing

Tensorflow I/O is a community led open source project. As such, the project depends on public contributions, bug-fixes, and documentation. Please see:

Build Status and CI

Build Status
Linux CPU Python 2 Status
Linux CPU Python 3 Status
Linux GPU Python 2 Status
Linux GPU Python 3 Status

Because of manylinux2010 requirement, TensorFlow I/O is built with Ubuntu:16.04 + Developer Toolset 7 (GCC 7.3) on Linux. Configuration with Ubuntu 16.04 with Developer Toolset 7 is not exactly straightforward. If the system have docker installed, then the following command will automatically build manylinux2010 compatible whl package:

#!/usr/bin/env bash

ls dist/*
for f in dist/*.whl; do
  docker run -i --rm -v $PWD:/v -w /v --net=host quay.io/pypa/manylinux2010_x86_64 bash -x -e /v/tools/build/auditwheel repair --plat manylinux2010_x86_64 $f
done
sudo chown -R $(id -nu):$(id -ng) .
ls wheelhouse/*

It takes some time to build, but once complete, there will be python 3.5, 3.6, 3.7 compatible whl packages available in wheelhouse directory.

On macOS, the same command could be used. However, the script expects python in shell and will only generate a whl package that matches the version of python in shell. If you want to build a whl package for a specific python then you have to alias this version of python to python in shell. See .github/workflows/build.yml Auditwheel step for instructions how to do that.

Note the above command is also the command we use when releasing packages for Linux and macOS.

TensorFlow I/O uses both GitHub Workflows and Google CI (Kokoro) for continuous integration. GitHub Workflows is used for macOS build and test. Kokoro is used for Linux build and test. Again, because of the manylinux2010 requirement, on Linux whl packages are always built with Ubuntu 16.04 + Developer Toolset 7. Tests are done on a variatiy of systems with different python3 versions to ensure a good coverage:

Python Ubuntu 18.04 Ubuntu 20.04 macOS + osx9 Windows-2019
2.7 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: N/A
3.7 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:
3.8 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:

TensorFlow I/O has integrations with many systems and cloud vendors such as Prometheus, Apache Kafka, Apache Ignite, Google Cloud PubSub, AWS Kinesis, Microsoft Azure Storage, Alibaba Cloud OSS etc.

We tried our best to test against those systems in our continuous integration whenever possible. Some tests such as Prometheus, Kafka, and Ignite are done with live systems, meaning we install Prometheus/Kafka/Ignite on CI machine before the test is run. Some tests such as Kinesis, PubSub, and Azure Storage are done through official or non-official emulators. Offline tests are also performed whenever possible, though systems covered through offine tests may not have the same level of coverage as live systems or emulators.

Live System Emulator CI Integration Offline
Apache Kafka :heavy_check_mark: :heavy_check_mark:
Apache Ignite :heavy_check_mark: :heavy_check_mark:
Prometheus :heavy_check_mark: :heavy_check_mark:
Google PubSub :heavy_check_mark: :heavy_check_mark:
Azure Storage :heavy_check_mark: :heavy_check_mark:
AWS Kinesis :heavy_check_mark: :heavy_check_mark:
Alibaba Cloud OSS :heavy_check_mark:
Google BigTable/BigQuery to be added
Elasticsearch (experimental) :heavy_check_mark: :heavy_check_mark:
MongoDB (experimental) :heavy_check_mark: :heavy_check_mark:

References for emulators:

Community

Additional Information

License

Apache License 2.0

Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.25.0.dev20220510023907-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.25.0.dev20220510023907-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 6101dfe9d441920f2bf57d5bfd3065ef8acbebb2cd23bc0b3e9bf15af03f9e89
MD5 6641d66d20dd9e57b919896ae2b897f4
BLAKE2b-256 452f27a55de8c5c94a4f58f934fe26c3e53f2ed38a34ce4a104aada17233934f

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.25.0.dev20220510023907-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.25.0.dev20220510023907-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 2afe5c5e4cfdd1ebdb2ede744cb01896c909a4da58f3989638197d880d679935
MD5 7a4796b95938ebf6f75b91e4ca09ed3d
BLAKE2b-256 aab4122488a2c069dbe930e5ceaea4b2c1e383da17611de0f47ab318fa299672

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.25.0.dev20220510023907-cp310-cp310-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.25.0.dev20220510023907-cp310-cp310-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 dbe533e31a3768dae6ace8e24ed32ff831026a66626066c2469c93d5eba30527
MD5 53a9e7f29b5438b3d15011c23811e299
BLAKE2b-256 6a96215d458bf8d73d8906c45f9582af4fd7a81e3705b594290c80b68e258d42

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.25.0.dev20220510023907-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.25.0.dev20220510023907-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 1133f5fab11d1eff55f22fcc54b3c8d27ac4d80142f8f9c9249b4ce07a6b466f
MD5 3bfea3c314ba8a4e8228a3076843c25c
BLAKE2b-256 4d7cef17a0fc613cf7be2b8a84af40f020e08892802089cb6ecffe50dfe6fa2f

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.25.0.dev20220510023907-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.25.0.dev20220510023907-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 0d2fe1a2ded339471641b948b46c24c37854914b0db94830d10ae93c28bfc4c1
MD5 519dc3595b456c8a79a3449b8133b20a
BLAKE2b-256 10dee296177076ec7ec2b4f33ac5fc5d91ae5fdf32520a74a8619ffb08fc8085

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.25.0.dev20220510023907-cp39-cp39-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.25.0.dev20220510023907-cp39-cp39-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 45288c4dfdddcdac760c0008636b38ea67f4e7da70983926461bbb5413a26be4
MD5 cfdc42f77f457fe78ee6d7ec874157f7
BLAKE2b-256 43b73be7b733efdd3b271a88874139ad86016a23968f4323ee957676ed9f43ab

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.25.0.dev20220510023907-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.25.0.dev20220510023907-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 1b2d25f7505befb292df57befb878a2709ed72bd4f74029368b036e758e59845
MD5 8cb6de22b00e07a67428f1292261d83f
BLAKE2b-256 7c89497933e8923b3fa2fe8432a9a432de66b5f0593bf01c610805b52f53a0bf

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.25.0.dev20220510023907-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.25.0.dev20220510023907-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 b71f7fc147d2de84d651b6e5481398b797113ce59fc92f1b87cd7f3785056ba5
MD5 1d48cad38f0b5ef9d70738b674d43edd
BLAKE2b-256 04a372dd9136137d713a5735df35193822d2106666a46f9fa7c9fa358e882959

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.25.0.dev20220510023907-cp38-cp38-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.25.0.dev20220510023907-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 644cc1f74b26d0cf66ff05c41f25d487f5a9904eedf0fa856a9db9f36b60a4b5
MD5 c8d1b8bae3f3e342a75211792047d512
BLAKE2b-256 74a8362f8df03056b7db6c55cdd0bebe464dc4cda6878af55eef66fed9790ff3

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.25.0.dev20220510023907-cp37-cp37m-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.25.0.dev20220510023907-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 639f3aa9cad2220527b3afc0ce7ec1e38b465ec8b4781433da16f4c7796c4953
MD5 40e6b27cb887c29303ccd67da306fe50
BLAKE2b-256 ceebe1e3255b0a81f0d725cc518286d62768d7bfb03e4a036e27cc954a617124

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.25.0.dev20220510023907-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.25.0.dev20220510023907-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 2d3878d68acc062f86cf3eb05234065f8c5c56c0db9406f01eabcb09bfcf7419
MD5 90b0ba956c6ffb51883241ce7090716f
BLAKE2b-256 e21e795aab3c08539dcea3b97a3b67ff214b8119da3848235eca197017771cc7

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.25.0.dev20220510023907-cp37-cp37m-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.25.0.dev20220510023907-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 66f1f5c73ef3a9b1cdea9fab3d371b386b6527f9bc72d718c018da6b6ee8cfde
MD5 625e4b0bddd728e6ad6010930e2c48c8
BLAKE2b-256 35b2db37f1218e5059c7c48090a4f5e4d454c9cb8196043d75c78059d191a801

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page