Skip to main content

TensorFlow IO

Project description




TensorFlow I/O

GitHub CI PyPI License Documentation

TensorFlow I/O is a collection of file systems and file formats that are not available in TensorFlow's built-in support. A full list of supported file systems and file formats by TensorFlow I/O can be found here.

The use of tensorflow-io is straightforward with keras. Below is an example to Get Started with TensorFlow with the data processing aspect replaced by tensorflow-io:

import tensorflow as tf
import tensorflow_io as tfio

# Read the MNIST data into the IODataset.
dataset_url = "https://storage.googleapis.com/cvdf-datasets/mnist/"
d_train = tfio.IODataset.from_mnist(
    dataset_url + "train-images-idx3-ubyte.gz",
    dataset_url + "train-labels-idx1-ubyte.gz",
)

# Shuffle the elements of the dataset.
d_train = d_train.shuffle(buffer_size=1024)

# By default image data is uint8, so convert to float32 using map().
d_train = d_train.map(lambda x, y: (tf.image.convert_image_dtype(x, tf.float32), y))

# prepare batches the data just like any other tf.data.Dataset
d_train = d_train.batch(32)

# Build the model.
model = tf.keras.models.Sequential(
    [
        tf.keras.layers.Flatten(input_shape=(28, 28)),
        tf.keras.layers.Dense(512, activation=tf.nn.relu),
        tf.keras.layers.Dropout(0.2),
        tf.keras.layers.Dense(10, activation=tf.nn.softmax),
    ]
)

# Compile the model.
model.compile(
    optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"]
)

# Fit the model.
model.fit(d_train, epochs=5, steps_per_epoch=200)

In the above MNIST example, the URL's to access the dataset files are passed directly to the tfio.IODataset.from_mnist API call. This is due to the inherent support that tensorflow-io provides for HTTP/HTTPS file system, thus eliminating the need for downloading and saving datasets on a local directory.

NOTE: Since tensorflow-io is able to detect and uncompress the MNIST dataset automatically if needed, we can pass the URL's for the compressed files (gzip) to the API call as is.

Please check the official documentation for more detailed and interesting usages of the package.

Installation

Python Package

The tensorflow-io Python package can be installed with pip directly using:

$ pip install tensorflow-io

People who are a little more adventurous can also try our nightly binaries:

$ pip install tensorflow-io-nightly

Docker Images

In addition to the pip packages, the docker images can be used to quickly get started.

For stable builds:

$ docker pull tfsigio/tfio:latest
$ docker run -it --rm --name tfio-latest tfsigio/tfio:latest

For nightly builds:

$ docker pull tfsigio/tfio:nightly
$ docker run -it --rm --name tfio-nightly tfsigio/tfio:nightly

R Package

Once the tensorflow-io Python package has been successfully installed, you can install the development version of the R package from GitHub via the following:

if (!require("remotes")) install.packages("remotes")
remotes::install_github("tensorflow/io", subdir = "R-package")

TensorFlow Version Compatibility

To ensure compatibility with TensorFlow, it is recommended to install a matching version of TensorFlow I/O according to the table below. You can find the list of releases here.

TensorFlow I/O Version TensorFlow Compatibility Release Date
0.17.1 2.4.x Apr 16, 2021
0.17.0 2.4.x Dec 14, 2020
0.16.0 2.3.x Oct 23, 2020
0.15.0 2.3.x Aug 03, 2020
0.14.0 2.2.x Jul 08, 2020
0.13.0 2.2.x May 10, 2020
0.12.0 2.1.x Feb 28, 2020
0.11.0 2.1.x Jan 10, 2020
0.10.0 2.0.x Dec 05, 2019
0.9.1 2.0.x Nov 15, 2019
0.9.0 2.0.x Oct 18, 2019
0.8.1 1.15.x Nov 15, 2019
0.8.0 1.15.x Oct 17, 2019
0.7.2 1.14.x Nov 15, 2019
0.7.1 1.14.x Oct 18, 2019
0.7.0 1.14.x Jul 14, 2019
0.6.0 1.13.x May 29, 2019
0.5.0 1.13.x Apr 12, 2019
0.4.0 1.13.x Mar 01, 2019
0.3.0 1.12.0 Feb 15, 2019
0.2.0 1.12.0 Jan 29, 2019
0.1.0 1.12.0 Dec 16, 2018

Performance Benchmarking

We use github-pages to document the results of API performance benchmarks. The benchmark job is triggered on every commit to master branch and facilitates tracking performance w.r.t commits.

Contributing

Tensorflow I/O is a community led open source project. As such, the project depends on public contributions, bug-fixes, and documentation. Please see:

Build Status and CI

Build Status
Linux CPU Python 2 Status
Linux CPU Python 3 Status
Linux GPU Python 2 Status
Linux GPU Python 3 Status

Because of manylinux2010 requirement, TensorFlow I/O is built with Ubuntu:16.04 + Developer Toolset 7 (GCC 7.3) on Linux. Configuration with Ubuntu 16.04 with Developer Toolset 7 is not exactly straightforward. If the system have docker installed, then the following command will automatically build manylinux2010 compatible whl package:

#!/usr/bin/env bash

ls dist/*
for f in dist/*.whl; do
  docker run -i --rm -v $PWD:/v -w /v --net=host quay.io/pypa/manylinux2010_x86_64 bash -x -e /v/tools/build/auditwheel repair --plat manylinux2010_x86_64 $f
done
sudo chown -R $(id -nu):$(id -ng) .
ls wheelhouse/*

It takes some time to build, but once complete, there will be python 3.5, 3.6, 3.7 compatible whl packages available in wheelhouse directory.

On macOS, the same command could be used. However, the script expects python in shell and will only generate a whl package that matches the version of python in shell. If you want to build a whl package for a specific python then you have to alias this version of python to python in shell. See .github/workflows/build.yml Auditwheel step for instructions how to do that.

Note the above command is also the command we use when releasing packages for Linux and macOS.

TensorFlow I/O uses both GitHub Workflows and Google CI (Kokoro) for continuous integration. GitHub Workflows is used for macOS build and test. Kokoro is used for Linux build and test. Again, because of the manylinux2010 requirement, on Linux whl packages are always built with Ubuntu 16.04 + Developer Toolset 7. Tests are done on a variatiy of systems with different python3 versions to ensure a good coverage:

Python Ubuntu 18.04 Ubuntu 20.04 macOS + osx9 Windows-2019
2.7 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: N/A
3.7 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:
3.8 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:

TensorFlow I/O has integrations with many systems and cloud vendors such as Prometheus, Apache Kafka, Apache Ignite, Google Cloud PubSub, AWS Kinesis, Microsoft Azure Storage, Alibaba Cloud OSS etc.

We tried our best to test against those systems in our continuous integration whenever possible. Some tests such as Prometheus, Kafka, and Ignite are done with live systems, meaning we install Prometheus/Kafka/Ignite on CI machine before the test is run. Some tests such as Kinesis, PubSub, and Azure Storage are done through official or non-official emulators. Offline tests are also performed whenever possible, though systems covered through offine tests may not have the same level of coverage as live systems or emulators.

Live System Emulator CI Integration Offline
Apache Kafka :heavy_check_mark: :heavy_check_mark:
Apache Ignite :heavy_check_mark: :heavy_check_mark:
Prometheus :heavy_check_mark: :heavy_check_mark:
Google PubSub :heavy_check_mark: :heavy_check_mark:
Azure Storage :heavy_check_mark: :heavy_check_mark:
AWS Kinesis :heavy_check_mark: :heavy_check_mark:
Alibaba Cloud OSS :heavy_check_mark:
Google BigTable/BigQuery to be added
Elasticsearch (experimental) :heavy_check_mark: :heavy_check_mark:
MongoDB (experimental) :heavy_check_mark: :heavy_check_mark:

References for emulators:

Community

Additional Information

License

Apache License 2.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505040519-cp39-cp39-manylinux2010_x86_64.whl (2.5 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.12+ x86-64

tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505040519-cp38-cp38-manylinux2010_x86_64.whl (2.5 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.12+ x86-64

tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505040519-cp37-cp37m-manylinux2010_x86_64.whl (2.5 MB view details)

Uploaded CPython 3.7m manylinux: glibc 2.12+ x86-64

tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505040519-cp36-cp36m-manylinux2010_x86_64.whl (2.5 MB view details)

Uploaded CPython 3.6m manylinux: glibc 2.12+ x86-64

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505040519-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505040519-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 a14b6280b7b83c949edd273ad0190931d2bb0d4fc8f510b8e7c7db372e374b8a
MD5 7d7f304e0e7f4b438f8fa058a501eea8
BLAKE2b-256 0d328bcf584baba42b00657845e286442bb5e44a7962d57ca25c2a279348b2aa

See more details on using hashes here.

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505040519-cp39-cp39-manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505040519-cp39-cp39-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 aad58eee873c3285a68a8eece469f4417285b901d1355584b0014313f103aae8
MD5 a8f88b7d742415d68c8e3c75d5a4764b
BLAKE2b-256 0c18d5aa6c4483fd844d8971529cee7ecb62addc8b5fc9015739ffd2dfb4539a

See more details on using hashes here.

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505040519-cp39-cp39-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505040519-cp39-cp39-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 1ae3d22383113f22734c1b589f3224d1cc03cfc6d3796a00d45a3d47e2dc59a0
MD5 e6c2f013eb429ccdfd77dc433f73a368
BLAKE2b-256 cd3f1395191c47ca286cb995a83b5c47ed1626ce831dfbcaa2f0681962314cce

See more details on using hashes here.

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505040519-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505040519-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 926ff04e93661ab0c567e028bd8184a7780e34f17d7a71eefe11a19576e3540f
MD5 0d5ed285576ea7b746f1f9a52b948517
BLAKE2b-256 e854df7b80f7e1e082d3c6cfcbf62610bba82e8efe903fa36a9fd53dee5e743f

See more details on using hashes here.

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505040519-cp38-cp38-manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505040519-cp38-cp38-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 3abccc13c338bd75d3be5930e55f02fa545126b66e7f3fadb80df353e537e380
MD5 300849b607e8b6362abee18aca2841e0
BLAKE2b-256 fb36bd8daa28644931a8b24b6cdd7a18cd8c5453e6f050b9e7e6a0dd73dedb09

See more details on using hashes here.

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505040519-cp38-cp38-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505040519-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 ac84251b0924d60fba3f17c4576518b963edd833f33476466dcc7d0d46afbaeb
MD5 75f70c2389ddb2b757b6c566ed3c9c17
BLAKE2b-256 ebe93f6c4d4632f43909f4204f0af7dc8414ce7d482b74937ae99947495a85d1

See more details on using hashes here.

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505040519-cp37-cp37m-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505040519-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 00fe60b5c5af84ed00017cf85fa05da0c52f4bcbf1daa4cb9c804d0f4fa1618c
MD5 6ebf2b546afd6b2ea052c4a3feeb61ab
BLAKE2b-256 7a6ac4315d4fb1546d6c5a6ffdf73200f50b68b15dc207dedd9f9830c95767c0

See more details on using hashes here.

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505040519-cp37-cp37m-manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505040519-cp37-cp37m-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 ed48d458337cdfc57b71adbe04d112ceecc9ae21d38876e8090d0cbb01caa56d
MD5 4ab120d900a14fa877541f9eddaea12a
BLAKE2b-256 6d35c9da3872ed1aa9e108455c1fe0f48fff0470291d238e0595976133dcc6c8

See more details on using hashes here.

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505040519-cp37-cp37m-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505040519-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 7478519cc06d2279cffad6872f663557f46fed39845d4ce7697d83900bf18524
MD5 9da1825b94d5ab674bada663abda1150
BLAKE2b-256 a08973b7f63d290296fb226e00aaca3dcec87b746a7d6edf4d7d853c584e65ef

See more details on using hashes here.

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505040519-cp36-cp36m-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505040519-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 0da5c0f7e5e06db5379e40d5c06b4b58d846d6a930c5e0707f930a98e4f3bfa1
MD5 25f5277c2b3e5480be7cb0f7303ac53b
BLAKE2b-256 5fc8ecdbf07d4c8b35c3b7de8d039483b9b33d176f6d39b26fb2cbc5179b38f1

See more details on using hashes here.

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505040519-cp36-cp36m-manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505040519-cp36-cp36m-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 4715413ead645ff4316e5a590c8a3285c715a1c0b9f4918f3fc6799d6db7af0d
MD5 878ca2442f17aa3b9fb5e13f4a5b2575
BLAKE2b-256 6f4e8b7f37fbfbc72c23c28c6c8177c978e2d3f63c3cfb767f197570286bdbd0

See more details on using hashes here.

File details

Details for the file tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505040519-cp36-cp36m-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_plugin_gs_nightly-0.18.0.dev20210505040519-cp36-cp36m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 874fbf813591a4003b8ee00e626a502d802eecb58fe0fcb2e8a2332909a29ea7
MD5 fed98dd8f2f6e9abab0dc70673770f47
BLAKE2b-256 18a2a7ed29ab1f404d3d11e8daf9b273dee8b350a4b729b64ecaa965b5e3071f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page