Skip to main content

TensorFlow IO

Project description




TensorFlow I/O

GitHub CI PyPI License Documentation

TensorFlow I/O is a collection of file systems and file formats that are not available in TensorFlow's built-in support. A full list of supported file systems and file formats by TensorFlow I/O can be found here.

The use of tensorflow-io is straightforward with keras. Below is an example to Get Started with TensorFlow with the data processing aspect replaced by tensorflow-io:

import tensorflow as tf
import tensorflow_io as tfio

# Read the MNIST data into the IODataset.
dataset_url = "https://storage.googleapis.com/cvdf-datasets/mnist/"
d_train = tfio.IODataset.from_mnist(
    dataset_url + "train-images-idx3-ubyte.gz",
    dataset_url + "train-labels-idx1-ubyte.gz",
)

# Shuffle the elements of the dataset.
d_train = d_train.shuffle(buffer_size=1024)

# By default image data is uint8, so convert to float32 using map().
d_train = d_train.map(lambda x, y: (tf.image.convert_image_dtype(x, tf.float32), y))

# prepare batches the data just like any other tf.data.Dataset
d_train = d_train.batch(32)

# Build the model.
model = tf.keras.models.Sequential(
    [
        tf.keras.layers.Flatten(input_shape=(28, 28)),
        tf.keras.layers.Dense(512, activation=tf.nn.relu),
        tf.keras.layers.Dropout(0.2),
        tf.keras.layers.Dense(10, activation=tf.nn.softmax),
    ]
)

# Compile the model.
model.compile(
    optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"]
)

# Fit the model.
model.fit(d_train, epochs=5, steps_per_epoch=200)

In the above MNIST example, the URL's to access the dataset files are passed directly to the tfio.IODataset.from_mnist API call. This is due to the inherent support that tensorflow-io provides for HTTP/HTTPS file system, thus eliminating the need for downloading and saving datasets on a local directory.

NOTE: Since tensorflow-io is able to detect and uncompress the MNIST dataset automatically if needed, we can pass the URL's for the compressed files (gzip) to the API call as is.

Please check the official documentation for more detailed and interesting usages of the package.

Installation

Python Package

The tensorflow-io Python package can be installed with pip directly using:

$ pip install tensorflow-io

People who are a little more adventurous can also try our nightly binaries:

$ pip install tensorflow-io-nightly

Docker Images

In addition to the pip packages, the docker images can be used to quickly get started.

For stable builds:

$ docker pull tfsigio/tfio:latest
$ docker run -it --rm --name tfio-latest tfsigio/tfio:latest

For nightly builds:

$ docker pull tfsigio/tfio:nightly
$ docker run -it --rm --name tfio-nightly tfsigio/tfio:nightly

R Package

Once the tensorflow-io Python package has been successfully installed, you can install the development version of the R package from GitHub via the following:

if (!require("remotes")) install.packages("remotes")
remotes::install_github("tensorflow/io", subdir = "R-package")

TensorFlow Version Compatibility

To ensure compatibility with TensorFlow, it is recommended to install a matching version of TensorFlow I/O according to the table below. You can find the list of releases here.

TensorFlow I/O Version TensorFlow Compatibility Release Date
0.18.0 2.5.x May 13, 2021
0.17.1 2.4.x Apr 16, 2021
0.17.0 2.4.x Dec 14, 2020
0.16.0 2.3.x Oct 23, 2020
0.15.0 2.3.x Aug 03, 2020
0.14.0 2.2.x Jul 08, 2020
0.13.0 2.2.x May 10, 2020
0.12.0 2.1.x Feb 28, 2020
0.11.0 2.1.x Jan 10, 2020
0.10.0 2.0.x Dec 05, 2019
0.9.1 2.0.x Nov 15, 2019
0.9.0 2.0.x Oct 18, 2019
0.8.1 1.15.x Nov 15, 2019
0.8.0 1.15.x Oct 17, 2019
0.7.2 1.14.x Nov 15, 2019
0.7.1 1.14.x Oct 18, 2019
0.7.0 1.14.x Jul 14, 2019
0.6.0 1.13.x May 29, 2019
0.5.0 1.13.x Apr 12, 2019
0.4.0 1.13.x Mar 01, 2019
0.3.0 1.12.0 Feb 15, 2019
0.2.0 1.12.0 Jan 29, 2019
0.1.0 1.12.0 Dec 16, 2018

Performance Benchmarking

We use github-pages to document the results of API performance benchmarks. The benchmark job is triggered on every commit to master branch and facilitates tracking performance w.r.t commits.

Contributing

Tensorflow I/O is a community led open source project. As such, the project depends on public contributions, bug-fixes, and documentation. Please see:

Build Status and CI

Build Status
Linux CPU Python 2 Status
Linux CPU Python 3 Status
Linux GPU Python 2 Status
Linux GPU Python 3 Status

Because of manylinux2010 requirement, TensorFlow I/O is built with Ubuntu:16.04 + Developer Toolset 7 (GCC 7.3) on Linux. Configuration with Ubuntu 16.04 with Developer Toolset 7 is not exactly straightforward. If the system have docker installed, then the following command will automatically build manylinux2010 compatible whl package:

#!/usr/bin/env bash

ls dist/*
for f in dist/*.whl; do
  docker run -i --rm -v $PWD:/v -w /v --net=host quay.io/pypa/manylinux2010_x86_64 bash -x -e /v/tools/build/auditwheel repair --plat manylinux2010_x86_64 $f
done
sudo chown -R $(id -nu):$(id -ng) .
ls wheelhouse/*

It takes some time to build, but once complete, there will be python 3.5, 3.6, 3.7 compatible whl packages available in wheelhouse directory.

On macOS, the same command could be used. However, the script expects python in shell and will only generate a whl package that matches the version of python in shell. If you want to build a whl package for a specific python then you have to alias this version of python to python in shell. See .github/workflows/build.yml Auditwheel step for instructions how to do that.

Note the above command is also the command we use when releasing packages for Linux and macOS.

TensorFlow I/O uses both GitHub Workflows and Google CI (Kokoro) for continuous integration. GitHub Workflows is used for macOS build and test. Kokoro is used for Linux build and test. Again, because of the manylinux2010 requirement, on Linux whl packages are always built with Ubuntu 16.04 + Developer Toolset 7. Tests are done on a variatiy of systems with different python3 versions to ensure a good coverage:

Python Ubuntu 18.04 Ubuntu 20.04 macOS + osx9 Windows-2019
2.7 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: N/A
3.7 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:
3.8 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:

TensorFlow I/O has integrations with many systems and cloud vendors such as Prometheus, Apache Kafka, Apache Ignite, Google Cloud PubSub, AWS Kinesis, Microsoft Azure Storage, Alibaba Cloud OSS etc.

We tried our best to test against those systems in our continuous integration whenever possible. Some tests such as Prometheus, Kafka, and Ignite are done with live systems, meaning we install Prometheus/Kafka/Ignite on CI machine before the test is run. Some tests such as Kinesis, PubSub, and Azure Storage are done through official or non-official emulators. Offline tests are also performed whenever possible, though systems covered through offine tests may not have the same level of coverage as live systems or emulators.

Live System Emulator CI Integration Offline
Apache Kafka :heavy_check_mark: :heavy_check_mark:
Apache Ignite :heavy_check_mark: :heavy_check_mark:
Prometheus :heavy_check_mark: :heavy_check_mark:
Google PubSub :heavy_check_mark: :heavy_check_mark:
Azure Storage :heavy_check_mark: :heavy_check_mark:
AWS Kinesis :heavy_check_mark: :heavy_check_mark:
Alibaba Cloud OSS :heavy_check_mark:
Google BigTable/BigQuery to be added
Elasticsearch (experimental) :heavy_check_mark: :heavy_check_mark:
MongoDB (experimental) :heavy_check_mark: :heavy_check_mark:

References for emulators:

Community

Additional Information

License

Apache License 2.0

Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519162850-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519162850-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 266b753ae679bf8a1283e9f79aed1f6ac2f02bc258b21c15e6f8e990e0d481d4
MD5 b7ef37625e9d43c8b670c87f4f5b24ba
BLAKE2b-256 d0f799cc2be2257b24ce96a7b84a8f02558b618ad43cd7d123c79a5c9a9e4db9

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519162850-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519162850-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 09ad6191ec012a4c5552a8f810889e76bbd664c70ff62fa6f882a9b464582a57
MD5 a513f0a897da47232bdda075c1c596c4
BLAKE2b-256 0827fd8789816edad9d1759ea5a2fc154a5675e6d2a8ab7427f89c3f6dd0adf3

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519162850-cp39-cp39-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519162850-cp39-cp39-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 be98ca2f0e4a2239dd19a174388001218468895dcfe06245b0526e18a2fe7a82
MD5 2155514b037eb081f8d4a7b5d39ed0db
BLAKE2b-256 f80b3b04e1b38b2c6c7fdbc844dd0bdbdedc05d9700e113089dcccaa39b293b1

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519162850-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519162850-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 c0ea2bc04a16c39c3704ff619e1dd3e8fd85491eba5dbede6d6b98ddf2a86eda
MD5 924e06068ad330c6d55b9d03938961db
BLAKE2b-256 b5d4279be5f43ccb81ae7936a8cb9e7bef1f722636115922a7546166f7eb3cdb

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519162850-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519162850-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 374f099976a732a90592a8efe1341a29cf2acc2ed8c281b59a9653fbc172607c
MD5 5b2b9aa2aae599d11760c91d6bd86fed
BLAKE2b-256 d15988c05b4c50b72a4320a5fa1659e662db6fbe5b437d07d548579e1608c637

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519162850-cp38-cp38-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519162850-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 b14189a0177c032ae67d132ccd8163f35ea4e218225b46e76ae7e07c305a9707
MD5 3110c689e40236e17d71bd3e78e99350
BLAKE2b-256 ddc28a0ff4bc06597ee5351d4aa28e7e1695195cb2462e6e53d4ff2d85f20ce4

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519162850-cp37-cp37m-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519162850-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 204b5b7fcf142a8634a2989152fedfc4d7bad49f0443d0c94e7e5e7046fb1416
MD5 1672a069c1b05dbc6bd21c4b11bef60c
BLAKE2b-256 924bdf5d61e06e45c2f0b8267b2b8ef95106fb285e023e24551265769d173e31

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519162850-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519162850-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 e54a3c5d6cb0a9048fc7cdf703570cfa6faf9fad20939d9ec7caa615044ba8bc
MD5 edbb9794226ce2dcbefd5954ec9b3b54
BLAKE2b-256 557f5178a39694a47f7dfea029dc0a08d77a2a74ded43cb33bfb0899a94898a6

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519162850-cp37-cp37m-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519162850-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 5b0b0714136fb390408f95e77bcd397fca5e27ea44a5c55b2f00737bfef8c2a8
MD5 df2cd187e5b3f9ee6b04e6eeae5a9f85
BLAKE2b-256 00a8037560a67424c0b22f2742823d4c194bf64df377f58b52920fc41150ebe1

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519162850-cp36-cp36m-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519162850-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 e508bcbe9ab3742032f2563076afc81a303e1395d9205f3babbd8ab0cd0eab07
MD5 3993acfbb9891932ecdde18530ce043c
BLAKE2b-256 398cf3cd4bca0eb6bd3c202368ba7de83a6bf6f764b92ed023874a6ce867cda1

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519162850-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519162850-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 2ba25fb998a6b391220ab29362635267698b34a71d2e54b5ea8726ced5e92384
MD5 de0f41fc1d607701a5b6055da8f898a1
BLAKE2b-256 a0e1cde80daf8850d7e4b4138d3be382746bda71442f27cadee7b98c8af63c83

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519162850-cp36-cp36m-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.18.0.dev20210519162850-cp36-cp36m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 4bcbf5018f02a8e4ee2807bf85f619e59e99317d96c4a6c1738e926071ec270a
MD5 eda06f39421786f0809c473711cc0879
BLAKE2b-256 409067f21570cdea6accb46f8ada9535cbf4c3fdc2b20225dc946d975953a55d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page