Skip to main content

TensorFlow IO

Project description




TensorFlow I/O

GitHub CI PyPI License Documentation

TensorFlow I/O is a collection of file systems and file formats that are not available in TensorFlow's built-in support. A full list of supported file systems and file formats by TensorFlow I/O can be found here.

The use of tensorflow-io is straightforward with keras. Below is an example to Get Started with TensorFlow with the data processing aspect replaced by tensorflow-io:

import tensorflow as tf
import tensorflow_io as tfio

# Read the MNIST data into the IODataset.
dataset_url = "https://storage.googleapis.com/cvdf-datasets/mnist/"
d_train = tfio.IODataset.from_mnist(
    dataset_url + "train-images-idx3-ubyte.gz",
    dataset_url + "train-labels-idx1-ubyte.gz",
)

# Shuffle the elements of the dataset.
d_train = d_train.shuffle(buffer_size=1024)

# By default image data is uint8, so convert to float32 using map().
d_train = d_train.map(lambda x, y: (tf.image.convert_image_dtype(x, tf.float32), y))

# prepare batches the data just like any other tf.data.Dataset
d_train = d_train.batch(32)

# Build the model.
model = tf.keras.models.Sequential(
    [
        tf.keras.layers.Flatten(input_shape=(28, 28)),
        tf.keras.layers.Dense(512, activation=tf.nn.relu),
        tf.keras.layers.Dropout(0.2),
        tf.keras.layers.Dense(10, activation=tf.nn.softmax),
    ]
)

# Compile the model.
model.compile(
    optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"]
)

# Fit the model.
model.fit(d_train, epochs=5, steps_per_epoch=200)

In the above MNIST example, the URL's to access the dataset files are passed directly to the tfio.IODataset.from_mnist API call. This is due to the inherent support that tensorflow-io provides for HTTP/HTTPS file system, thus eliminating the need for downloading and saving datasets on a local directory.

NOTE: Since tensorflow-io is able to detect and uncompress the MNIST dataset automatically if needed, we can pass the URL's for the compressed files (gzip) to the API call as is.

Please check the official documentation for more detailed and interesting usages of the package.

Installation

Python Package

The tensorflow-io Python package can be installed with pip directly using:

$ pip install tensorflow-io

People who are a little more adventurous can also try our nightly binaries:

$ pip install tensorflow-io-nightly

Docker Images

In addition to the pip packages, the docker images can be used to quickly get started.

For stable builds:

$ docker pull tfsigio/tfio:latest
$ docker run -it --rm --name tfio-latest tfsigio/tfio:latest

For nightly builds:

$ docker pull tfsigio/tfio:nightly
$ docker run -it --rm --name tfio-nightly tfsigio/tfio:nightly

R Package

Once the tensorflow-io Python package has been successfully installed, you can install the development version of the R package from GitHub via the following:

if (!require("remotes")) install.packages("remotes")
remotes::install_github("tensorflow/io", subdir = "R-package")

TensorFlow Version Compatibility

To ensure compatibility with TensorFlow, it is recommended to install a matching version of TensorFlow I/O according to the table below. You can find the list of releases here.

TensorFlow I/O Version TensorFlow Compatibility Release Date
0.19.1 2.5.x Jul 25, 2021
0.19.0 2.5.x Jun 25, 2021
0.18.0 2.5.x May 13, 2021
0.17.1 2.4.x Apr 16, 2021
0.17.0 2.4.x Dec 14, 2020
0.16.0 2.3.x Oct 23, 2020
0.15.0 2.3.x Aug 03, 2020
0.14.0 2.2.x Jul 08, 2020
0.13.0 2.2.x May 10, 2020
0.12.0 2.1.x Feb 28, 2020
0.11.0 2.1.x Jan 10, 2020
0.10.0 2.0.x Dec 05, 2019
0.9.1 2.0.x Nov 15, 2019
0.9.0 2.0.x Oct 18, 2019
0.8.1 1.15.x Nov 15, 2019
0.8.0 1.15.x Oct 17, 2019
0.7.2 1.14.x Nov 15, 2019
0.7.1 1.14.x Oct 18, 2019
0.7.0 1.14.x Jul 14, 2019
0.6.0 1.13.x May 29, 2019
0.5.0 1.13.x Apr 12, 2019
0.4.0 1.13.x Mar 01, 2019
0.3.0 1.12.0 Feb 15, 2019
0.2.0 1.12.0 Jan 29, 2019
0.1.0 1.12.0 Dec 16, 2018

Performance Benchmarking

We use github-pages to document the results of API performance benchmarks. The benchmark job is triggered on every commit to master branch and facilitates tracking performance w.r.t commits.

Contributing

Tensorflow I/O is a community led open source project. As such, the project depends on public contributions, bug-fixes, and documentation. Please see:

Build Status and CI

Build Status
Linux CPU Python 2 Status
Linux CPU Python 3 Status
Linux GPU Python 2 Status
Linux GPU Python 3 Status

Because of manylinux2010 requirement, TensorFlow I/O is built with Ubuntu:16.04 + Developer Toolset 7 (GCC 7.3) on Linux. Configuration with Ubuntu 16.04 with Developer Toolset 7 is not exactly straightforward. If the system have docker installed, then the following command will automatically build manylinux2010 compatible whl package:

#!/usr/bin/env bash

ls dist/*
for f in dist/*.whl; do
  docker run -i --rm -v $PWD:/v -w /v --net=host quay.io/pypa/manylinux2010_x86_64 bash -x -e /v/tools/build/auditwheel repair --plat manylinux2010_x86_64 $f
done
sudo chown -R $(id -nu):$(id -ng) .
ls wheelhouse/*

It takes some time to build, but once complete, there will be python 3.5, 3.6, 3.7 compatible whl packages available in wheelhouse directory.

On macOS, the same command could be used. However, the script expects python in shell and will only generate a whl package that matches the version of python in shell. If you want to build a whl package for a specific python then you have to alias this version of python to python in shell. See .github/workflows/build.yml Auditwheel step for instructions how to do that.

Note the above command is also the command we use when releasing packages for Linux and macOS.

TensorFlow I/O uses both GitHub Workflows and Google CI (Kokoro) for continuous integration. GitHub Workflows is used for macOS build and test. Kokoro is used for Linux build and test. Again, because of the manylinux2010 requirement, on Linux whl packages are always built with Ubuntu 16.04 + Developer Toolset 7. Tests are done on a variatiy of systems with different python3 versions to ensure a good coverage:

Python Ubuntu 18.04 Ubuntu 20.04 macOS + osx9 Windows-2019
2.7 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: N/A
3.7 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:
3.8 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:

TensorFlow I/O has integrations with many systems and cloud vendors such as Prometheus, Apache Kafka, Apache Ignite, Google Cloud PubSub, AWS Kinesis, Microsoft Azure Storage, Alibaba Cloud OSS etc.

We tried our best to test against those systems in our continuous integration whenever possible. Some tests such as Prometheus, Kafka, and Ignite are done with live systems, meaning we install Prometheus/Kafka/Ignite on CI machine before the test is run. Some tests such as Kinesis, PubSub, and Azure Storage are done through official or non-official emulators. Offline tests are also performed whenever possible, though systems covered through offine tests may not have the same level of coverage as live systems or emulators.

Live System Emulator CI Integration Offline
Apache Kafka :heavy_check_mark: :heavy_check_mark:
Apache Ignite :heavy_check_mark: :heavy_check_mark:
Prometheus :heavy_check_mark: :heavy_check_mark:
Google PubSub :heavy_check_mark: :heavy_check_mark:
Azure Storage :heavy_check_mark: :heavy_check_mark:
AWS Kinesis :heavy_check_mark: :heavy_check_mark:
Alibaba Cloud OSS :heavy_check_mark:
Google BigTable/BigQuery to be added
Elasticsearch (experimental) :heavy_check_mark: :heavy_check_mark:
MongoDB (experimental) :heavy_check_mark: :heavy_check_mark:

References for emulators:

Community

Additional Information

License

Apache License 2.0

Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210730202550-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210730202550-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 7bc390b7ef9f57230a5427eaa5723433870224cbc048efbf5b438385af43de3f
MD5 8152199245242f8b2a4604c098e92262
BLAKE2b-256 e5eaee51b882980a13c1c1979aedce4884dd72f2c3c6ed18c89ef78ecd2f8d54

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210730202550-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210730202550-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 47d983ff4fba8cece17266b14d328636a7a488cc16d835e111a3413ad67825d1
MD5 bef784795deb89d941968a7103c342ec
BLAKE2b-256 43150efc6d9f9e287c08d6c631a60ea336465715447a4efb8bfb58cb93a2d9da

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210730202550-cp39-cp39-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210730202550-cp39-cp39-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 cc34bb9d32311088b63aaac9b9778f7c7f3f59f2329203b1b24f999c6b5ab17a
MD5 d321e8e4c150adce5994a75a78c4a171
BLAKE2b-256 e452b5498768437a3b4d29f4d67bd9efd35e6b00c09f89a8cd4cc0b0d7083b8d

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210730202550-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210730202550-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 36d5ad89d2c04ebd5948a62abf17a4ce877102b752202127420802aa37469c3f
MD5 5e722e963788a7ddec684a2396d38502
BLAKE2b-256 81e63ef003c7a800c1717c48f09991dae221772aba67c453f1b13aba1126cf9b

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210730202550-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210730202550-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 113fd3148b874779b87b080996ab5432de7442ee46a639ffac64d359b9a0ffe7
MD5 bc4aee9f2d04891c67df4f79e2c113a5
BLAKE2b-256 e4a44c2c87c6d0ff583d58d17bc40151c8f0da04320c8ac72b6c521f310d1be1

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210730202550-cp38-cp38-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210730202550-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 54cdd0c65a86e6b2c2cf154f201b72c53229ca2d3f3e398b67c5e9fb688dfef9
MD5 79c36229929575c71f1c2075d7a4719a
BLAKE2b-256 f2ecb32950a47c4b1ab5815ea77250b00823c0e14629a5c164596fb382535a9d

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210730202550-cp37-cp37m-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210730202550-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 348e8230d24bb572b4c3dc19b8dd96a6b4c1167fd589dab244438e073d092cdc
MD5 cca9ae3ce8957923a79cfe699f0610db
BLAKE2b-256 6fad2ee3c0cc9f64b536fa5ed00a098f60622c2c5cd75f51563fd7fa98e9279b

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210730202550-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210730202550-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 f64e9ce0beba54bbc10b87380ffedc66d72723460cc2da53e64359ca84d13d64
MD5 eee15953f41a61fab062a5e6d0cee689
BLAKE2b-256 201e3487832ca8fca2afd435122f2e8160de24ee8d786a6fafcf423b4ab73f89

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210730202550-cp37-cp37m-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210730202550-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 83f93fecab1579e81063e39692b62b0e3e6ec5bb864dbd2439ffb20c3623eed4
MD5 f280e2947b60e7d9f82f16543c690313
BLAKE2b-256 893adca138ee0bcd7d8f8af6112a1b3668fa9271bf2eae639b8295b4d450db6b

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210730202550-cp36-cp36m-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210730202550-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 d101f6b169a6247f328039efcf1bb349aa4143a088916bd0688a93f124d208a0
MD5 81ca89fd68fd544e0e1d413d33e07df3
BLAKE2b-256 67ceceee3a7adcb41e1754d4abd58febdacd81298dd0d74ef2d8a8f093c34773

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210730202550-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210730202550-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 55476db013a075a183a186b0d1c1c9e73f96954db6aa8b95c1c46a08ad7c3954
MD5 9def6f0d7c5411da764d8e76c58b9aa3
BLAKE2b-256 73f991fb588ae678daac394fabc566494c0200c2c363e2ae13658bf813f46e7c

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210730202550-cp36-cp36m-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210730202550-cp36-cp36m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 949d9cfbacdf419e47150f790cbdf47ade736e539666ea417eaf8c3479cce254
MD5 3175541ae1ab38788088a31d30d54dc2
BLAKE2b-256 8d8c885cf7a492943b43154665e16aacd8b9205d1bb73253c30662e895c4070e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page