Skip to main content

TensorFlow IO

Project description




TensorFlow I/O

GitHub CI PyPI License Documentation

TensorFlow I/O is a collection of file systems and file formats that are not available in TensorFlow's built-in support. A full list of supported file systems and file formats by TensorFlow I/O can be found here.

The use of tensorflow-io is straightforward with keras. Below is an example to Get Started with TensorFlow with the data processing aspect replaced by tensorflow-io:

import tensorflow as tf
import tensorflow_io as tfio

# Read the MNIST data into the IODataset.
dataset_url = "https://storage.googleapis.com/cvdf-datasets/mnist/"
d_train = tfio.IODataset.from_mnist(
    dataset_url + "train-images-idx3-ubyte.gz",
    dataset_url + "train-labels-idx1-ubyte.gz",
)

# Shuffle the elements of the dataset.
d_train = d_train.shuffle(buffer_size=1024)

# By default image data is uint8, so convert to float32 using map().
d_train = d_train.map(lambda x, y: (tf.image.convert_image_dtype(x, tf.float32), y))

# prepare batches the data just like any other tf.data.Dataset
d_train = d_train.batch(32)

# Build the model.
model = tf.keras.models.Sequential(
    [
        tf.keras.layers.Flatten(input_shape=(28, 28)),
        tf.keras.layers.Dense(512, activation=tf.nn.relu),
        tf.keras.layers.Dropout(0.2),
        tf.keras.layers.Dense(10, activation=tf.nn.softmax),
    ]
)

# Compile the model.
model.compile(
    optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"]
)

# Fit the model.
model.fit(d_train, epochs=5, steps_per_epoch=200)

In the above MNIST example, the URL's to access the dataset files are passed directly to the tfio.IODataset.from_mnist API call. This is due to the inherent support that tensorflow-io provides for HTTP/HTTPS file system, thus eliminating the need for downloading and saving datasets on a local directory.

NOTE: Since tensorflow-io is able to detect and uncompress the MNIST dataset automatically if needed, we can pass the URL's for the compressed files (gzip) to the API call as is.

Please check the official documentation for more detailed and interesting usages of the package.

Installation

Python Package

The tensorflow-io Python package can be installed with pip directly using:

$ pip install tensorflow-io

People who are a little more adventurous can also try our nightly binaries:

$ pip install tensorflow-io-nightly

To ensure you have a version of TensorFlow that is compatible with TensorFlow-IO, you can specify the tensorflow extra requirement during install:

pip install tensorflow-io[tensorflow]

Similar extras exist for the tensorflow-gpu, tensorflow-cpu and tensorflow-rocm packages.

Docker Images

In addition to the pip packages, the docker images can be used to quickly get started.

For stable builds:

$ docker pull tfsigio/tfio:latest
$ docker run -it --rm --name tfio-latest tfsigio/tfio:latest

For nightly builds:

$ docker pull tfsigio/tfio:nightly
$ docker run -it --rm --name tfio-nightly tfsigio/tfio:nightly

R Package

Once the tensorflow-io Python package has been successfully installed, you can install the development version of the R package from GitHub via the following:

if (!require("remotes")) install.packages("remotes")
remotes::install_github("tensorflow/io", subdir = "R-package")

TensorFlow Version Compatibility

To ensure compatibility with TensorFlow, it is recommended to install a matching version of TensorFlow I/O according to the table below. You can find the list of releases here.

TensorFlow I/O Version TensorFlow Compatibility Release Date
0.23.0 2.7.x Dec 14, 2021
0.22.0 2.7.x Nov 10, 2021
0.21.0 2.6.x Sep 12, 2021
0.20.0 2.6.x Aug 11, 2021
0.19.1 2.5.x Jul 25, 2021
0.19.0 2.5.x Jun 25, 2021
0.18.0 2.5.x May 13, 2021
0.17.1 2.4.x Apr 16, 2021
0.17.0 2.4.x Dec 14, 2020
0.16.0 2.3.x Oct 23, 2020
0.15.0 2.3.x Aug 03, 2020
0.14.0 2.2.x Jul 08, 2020
0.13.0 2.2.x May 10, 2020
0.12.0 2.1.x Feb 28, 2020
0.11.0 2.1.x Jan 10, 2020
0.10.0 2.0.x Dec 05, 2019
0.9.1 2.0.x Nov 15, 2019
0.9.0 2.0.x Oct 18, 2019
0.8.1 1.15.x Nov 15, 2019
0.8.0 1.15.x Oct 17, 2019
0.7.2 1.14.x Nov 15, 2019
0.7.1 1.14.x Oct 18, 2019
0.7.0 1.14.x Jul 14, 2019
0.6.0 1.13.x May 29, 2019
0.5.0 1.13.x Apr 12, 2019
0.4.0 1.13.x Mar 01, 2019
0.3.0 1.12.0 Feb 15, 2019
0.2.0 1.12.0 Jan 29, 2019
0.1.0 1.12.0 Dec 16, 2018

Performance Benchmarking

We use github-pages to document the results of API performance benchmarks. The benchmark job is triggered on every commit to master branch and facilitates tracking performance w.r.t commits.

Contributing

Tensorflow I/O is a community led open source project. As such, the project depends on public contributions, bug-fixes, and documentation. Please see:

Build Status and CI

Build Status
Linux CPU Python 2 Status
Linux CPU Python 3 Status
Linux GPU Python 2 Status
Linux GPU Python 3 Status

Because of manylinux2010 requirement, TensorFlow I/O is built with Ubuntu:16.04 + Developer Toolset 7 (GCC 7.3) on Linux. Configuration with Ubuntu 16.04 with Developer Toolset 7 is not exactly straightforward. If the system have docker installed, then the following command will automatically build manylinux2010 compatible whl package:

#!/usr/bin/env bash

ls dist/*
for f in dist/*.whl; do
  docker run -i --rm -v $PWD:/v -w /v --net=host quay.io/pypa/manylinux2010_x86_64 bash -x -e /v/tools/build/auditwheel repair --plat manylinux2010_x86_64 $f
done
sudo chown -R $(id -nu):$(id -ng) .
ls wheelhouse/*

It takes some time to build, but once complete, there will be python 3.5, 3.6, 3.7 compatible whl packages available in wheelhouse directory.

On macOS, the same command could be used. However, the script expects python in shell and will only generate a whl package that matches the version of python in shell. If you want to build a whl package for a specific python then you have to alias this version of python to python in shell. See .github/workflows/build.yml Auditwheel step for instructions how to do that.

Note the above command is also the command we use when releasing packages for Linux and macOS.

TensorFlow I/O uses both GitHub Workflows and Google CI (Kokoro) for continuous integration. GitHub Workflows is used for macOS build and test. Kokoro is used for Linux build and test. Again, because of the manylinux2010 requirement, on Linux whl packages are always built with Ubuntu 16.04 + Developer Toolset 7. Tests are done on a variatiy of systems with different python3 versions to ensure a good coverage:

Python Ubuntu 18.04 Ubuntu 20.04 macOS + osx9 Windows-2019
2.7 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: N/A
3.7 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:
3.8 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:

TensorFlow I/O has integrations with many systems and cloud vendors such as Prometheus, Apache Kafka, Apache Ignite, Google Cloud PubSub, AWS Kinesis, Microsoft Azure Storage, Alibaba Cloud OSS etc.

We tried our best to test against those systems in our continuous integration whenever possible. Some tests such as Prometheus, Kafka, and Ignite are done with live systems, meaning we install Prometheus/Kafka/Ignite on CI machine before the test is run. Some tests such as Kinesis, PubSub, and Azure Storage are done through official or non-official emulators. Offline tests are also performed whenever possible, though systems covered through offine tests may not have the same level of coverage as live systems or emulators.

Live System Emulator CI Integration Offline
Apache Kafka :heavy_check_mark: :heavy_check_mark:
Apache Ignite :heavy_check_mark: :heavy_check_mark:
Prometheus :heavy_check_mark: :heavy_check_mark:
Google PubSub :heavy_check_mark: :heavy_check_mark:
Azure Storage :heavy_check_mark: :heavy_check_mark:
AWS Kinesis :heavy_check_mark: :heavy_check_mark:
Alibaba Cloud OSS :heavy_check_mark:
Google BigTable/BigQuery to be added
Elasticsearch (experimental) :heavy_check_mark: :heavy_check_mark:
MongoDB (experimental) :heavy_check_mark: :heavy_check_mark:

References for emulators:

Community

Additional Information

License

Apache License 2.0

Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.23.0.dev20211214032438-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.23.0.dev20211214032438-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 d9a008658b09aa6e625c5846c617122facb790fb946bf15fbbe5411349d4e14c
MD5 f418c05472f5ec80e20dd38c96f7715b
BLAKE2b-256 193b544ec5267817bcaa75d2b30ea573162958f62ce9c625137c8a8d58c15ef6

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.23.0.dev20211214032438-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.23.0.dev20211214032438-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 a6a5f176568ec74eeede1f1e57f2d08b4b8c9366f4a5f0680d7abff25c395d10
MD5 128057827d9cd59980a499caca20e72d
BLAKE2b-256 6151c2bc6244dc651f011112fc5b4c0bb412a2f637bbe0dd222ef7929469cc1e

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.23.0.dev20211214032438-cp310-cp310-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.23.0.dev20211214032438-cp310-cp310-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 6190dedb74fa9e444a193cdffcebe20c0b66bcef713ab89e80f44cb0194b4625
MD5 f62767fd712118d97a1923acce1ce598
BLAKE2b-256 a189b334d110105c7e3e2bb72f344e5a6479d720c2756579e8713646ef4ac86c

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.23.0.dev20211214032438-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.23.0.dev20211214032438-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 a25c2b8297d68220ac37be3b8fad25b56cdb6a28a1849ef35163ff4d7d7c7ee9
MD5 ee8001230780e8d1564ad72bd92153ee
BLAKE2b-256 25b5eb627e98afd97352c293d8b2f17c43cfa96b8620b656bd17c66a94c8af2b

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.23.0.dev20211214032438-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.23.0.dev20211214032438-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 1072acace2d3fa07b8f06e195a100baf974bbe0e1c8ec9002c870019f9181b18
MD5 a649fb431c381b11b85423f643e053e5
BLAKE2b-256 3c6c120840a1f145efa6daa37fa0a016273b2bd677e1bd2804db29044de1f773

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.23.0.dev20211214032438-cp39-cp39-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.23.0.dev20211214032438-cp39-cp39-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 799af2cc70f898e7b735a778c87b5c30811710ea2d7d8417b7b6bc8fdb92d902
MD5 a9fb53b09a5cbbf5c9472fe9707cead6
BLAKE2b-256 5e06ceeb0525ba18fbb1a4c96237095168d23a569d7c06f8297d382b0a16f162

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.23.0.dev20211214032438-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.23.0.dev20211214032438-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 d8f6f5f918b674bf6a27111da9dd69cde9a90a31729799971177e02153b1e6b3
MD5 e9770b92b882ce69b2d9b14a4e00cc68
BLAKE2b-256 2ac5ca0d56a10cf2271ed8c26ff643f45b98f88c9e01d7b6db2ddc427446dd31

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.23.0.dev20211214032438-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.23.0.dev20211214032438-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 745a0c56268e2a85e5a7a955db79457879619b54f4c24caf5f2da9f6a24eb8fe
MD5 3c1f68e0f73062619d05d2221d09a88b
BLAKE2b-256 a2397454922bbfcb2d41fd9f18c4f2a74edc6e47860e881ceb4b97bee7d0fa8f

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.23.0.dev20211214032438-cp38-cp38-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.23.0.dev20211214032438-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 e2e359a1182625ba6d8b3a6fb3a6fbaf7021867bc03d333c9b6b88542cfc759b
MD5 1ceed16fbfda1124f898e850ed01554f
BLAKE2b-256 cc10c97a85b08b9cdb199df43af8ef65f05cd3100c4f5166b695cccddebb2f4e

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.23.0.dev20211214032438-cp37-cp37m-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.23.0.dev20211214032438-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 ca6deb23a99e1a2e9185d096262132cc272591bf51968e19d43341ef50364a68
MD5 8e9ddbcb2ffe07029ef907bd570c3233
BLAKE2b-256 518cca86c0bae4ac44ccda0963b7c25fd3451f8b14182fcd7acfdced21eb4e08

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.23.0.dev20211214032438-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.23.0.dev20211214032438-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 69684f4d758b6f27db1d97688fa617d404da4119ea26e753a6a70db289614852
MD5 46b27ed1329943426ce9b02f4b75d4c6
BLAKE2b-256 13bab4636844006eb30a25ff284de51b7b5ab106554b32a9a4961f15521de2cd

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.23.0.dev20211214032438-cp37-cp37m-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.23.0.dev20211214032438-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 490f3ec3a64b75b5481ce674fadc9b91bf0e522805cfd3b331f04bb526df2e90
MD5 89e274fe5a703d6d6d0242f6724dfc88
BLAKE2b-256 43fd72e7af6457a1db50c7c4f80ec2ac1f3766736044d560d27d11c41c2f0cc8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page