Skip to main content

TensorFlow IO

Project description




TensorFlow I/O

GitHub CI PyPI License Documentation

TensorFlow I/O is a collection of file systems and file formats that are not available in TensorFlow's built-in support. A full list of supported file systems and file formats by TensorFlow I/O can be found here.

The use of tensorflow-io is straightforward with keras. Below is an example to Get Started with TensorFlow with the data processing aspect replaced by tensorflow-io:

import tensorflow as tf
import tensorflow_io as tfio

# Read the MNIST data into the IODataset.
dataset_url = "https://storage.googleapis.com/cvdf-datasets/mnist/"
d_train = tfio.IODataset.from_mnist(
    dataset_url + "train-images-idx3-ubyte.gz",
    dataset_url + "train-labels-idx1-ubyte.gz",
)

# Shuffle the elements of the dataset.
d_train = d_train.shuffle(buffer_size=1024)

# By default image data is uint8, so convert to float32 using map().
d_train = d_train.map(lambda x, y: (tf.image.convert_image_dtype(x, tf.float32), y))

# prepare batches the data just like any other tf.data.Dataset
d_train = d_train.batch(32)

# Build the model.
model = tf.keras.models.Sequential(
    [
        tf.keras.layers.Flatten(input_shape=(28, 28)),
        tf.keras.layers.Dense(512, activation=tf.nn.relu),
        tf.keras.layers.Dropout(0.2),
        tf.keras.layers.Dense(10, activation=tf.nn.softmax),
    ]
)

# Compile the model.
model.compile(
    optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"]
)

# Fit the model.
model.fit(d_train, epochs=5, steps_per_epoch=200)

In the above MNIST example, the URL's to access the dataset files are passed directly to the tfio.IODataset.from_mnist API call. This is due to the inherent support that tensorflow-io provides for HTTP/HTTPS file system, thus eliminating the need for downloading and saving datasets on a local directory.

NOTE: Since tensorflow-io is able to detect and uncompress the MNIST dataset automatically if needed, we can pass the URL's for the compressed files (gzip) to the API call as is.

Please check the official documentation for more detailed and interesting usages of the package.

Installation

Python Package

The tensorflow-io Python package can be installed with pip directly using:

$ pip install tensorflow-io

People who are a little more adventurous can also try our nightly binaries:

$ pip install tensorflow-io-nightly

To ensure you have a version of TensorFlow that is compatible with TensorFlow-IO, you can specify the tensorflow extra requirement during install:

pip install tensorflow-io[tensorflow]

Similar extras exist for the tensorflow-gpu, tensorflow-cpu and tensorflow-rocm packages.

Docker Images

In addition to the pip packages, the docker images can be used to quickly get started.

For stable builds:

$ docker pull tfsigio/tfio:latest
$ docker run -it --rm --name tfio-latest tfsigio/tfio:latest

For nightly builds:

$ docker pull tfsigio/tfio:nightly
$ docker run -it --rm --name tfio-nightly tfsigio/tfio:nightly

R Package

Once the tensorflow-io Python package has been successfully installed, you can install the development version of the R package from GitHub via the following:

if (!require("remotes")) install.packages("remotes")
remotes::install_github("tensorflow/io", subdir = "R-package")

TensorFlow Version Compatibility

To ensure compatibility with TensorFlow, it is recommended to install a matching version of TensorFlow I/O according to the table below. You can find the list of releases here.

TensorFlow I/O Version TensorFlow Compatibility Release Date
0.24.0 2.8.x Feb 04, 2022
0.23.1 2.7.x Dec 15, 2021
0.23.0 2.7.x Dec 14, 2021
0.22.0 2.7.x Nov 10, 2021
0.21.0 2.6.x Sep 12, 2021
0.20.0 2.6.x Aug 11, 2021
0.19.1 2.5.x Jul 25, 2021
0.19.0 2.5.x Jun 25, 2021
0.18.0 2.5.x May 13, 2021
0.17.1 2.4.x Apr 16, 2021
0.17.0 2.4.x Dec 14, 2020
0.16.0 2.3.x Oct 23, 2020
0.15.0 2.3.x Aug 03, 2020
0.14.0 2.2.x Jul 08, 2020
0.13.0 2.2.x May 10, 2020
0.12.0 2.1.x Feb 28, 2020
0.11.0 2.1.x Jan 10, 2020
0.10.0 2.0.x Dec 05, 2019
0.9.1 2.0.x Nov 15, 2019
0.9.0 2.0.x Oct 18, 2019
0.8.1 1.15.x Nov 15, 2019
0.8.0 1.15.x Oct 17, 2019
0.7.2 1.14.x Nov 15, 2019
0.7.1 1.14.x Oct 18, 2019
0.7.0 1.14.x Jul 14, 2019
0.6.0 1.13.x May 29, 2019
0.5.0 1.13.x Apr 12, 2019
0.4.0 1.13.x Mar 01, 2019
0.3.0 1.12.0 Feb 15, 2019
0.2.0 1.12.0 Jan 29, 2019
0.1.0 1.12.0 Dec 16, 2018

Performance Benchmarking

We use github-pages to document the results of API performance benchmarks. The benchmark job is triggered on every commit to master branch and facilitates tracking performance w.r.t commits.

Contributing

Tensorflow I/O is a community led open source project. As such, the project depends on public contributions, bug-fixes, and documentation. Please see:

Build Status and CI

Build Status
Linux CPU Python 2 Status
Linux CPU Python 3 Status
Linux GPU Python 2 Status
Linux GPU Python 3 Status

Because of manylinux2010 requirement, TensorFlow I/O is built with Ubuntu:16.04 + Developer Toolset 7 (GCC 7.3) on Linux. Configuration with Ubuntu 16.04 with Developer Toolset 7 is not exactly straightforward. If the system have docker installed, then the following command will automatically build manylinux2010 compatible whl package:

#!/usr/bin/env bash

ls dist/*
for f in dist/*.whl; do
  docker run -i --rm -v $PWD:/v -w /v --net=host quay.io/pypa/manylinux2010_x86_64 bash -x -e /v/tools/build/auditwheel repair --plat manylinux2010_x86_64 $f
done
sudo chown -R $(id -nu):$(id -ng) .
ls wheelhouse/*

It takes some time to build, but once complete, there will be python 3.5, 3.6, 3.7 compatible whl packages available in wheelhouse directory.

On macOS, the same command could be used. However, the script expects python in shell and will only generate a whl package that matches the version of python in shell. If you want to build a whl package for a specific python then you have to alias this version of python to python in shell. See .github/workflows/build.yml Auditwheel step for instructions how to do that.

Note the above command is also the command we use when releasing packages for Linux and macOS.

TensorFlow I/O uses both GitHub Workflows and Google CI (Kokoro) for continuous integration. GitHub Workflows is used for macOS build and test. Kokoro is used for Linux build and test. Again, because of the manylinux2010 requirement, on Linux whl packages are always built with Ubuntu 16.04 + Developer Toolset 7. Tests are done on a variatiy of systems with different python3 versions to ensure a good coverage:

Python Ubuntu 18.04 Ubuntu 20.04 macOS + osx9 Windows-2019
2.7 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: N/A
3.7 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:
3.8 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:

TensorFlow I/O has integrations with many systems and cloud vendors such as Prometheus, Apache Kafka, Apache Ignite, Google Cloud PubSub, AWS Kinesis, Microsoft Azure Storage, Alibaba Cloud OSS etc.

We tried our best to test against those systems in our continuous integration whenever possible. Some tests such as Prometheus, Kafka, and Ignite are done with live systems, meaning we install Prometheus/Kafka/Ignite on CI machine before the test is run. Some tests such as Kinesis, PubSub, and Azure Storage are done through official or non-official emulators. Offline tests are also performed whenever possible, though systems covered through offine tests may not have the same level of coverage as live systems or emulators.

Live System Emulator CI Integration Offline
Apache Kafka :heavy_check_mark: :heavy_check_mark:
Apache Ignite :heavy_check_mark: :heavy_check_mark:
Prometheus :heavy_check_mark: :heavy_check_mark:
Google PubSub :heavy_check_mark: :heavy_check_mark:
Azure Storage :heavy_check_mark: :heavy_check_mark:
AWS Kinesis :heavy_check_mark: :heavy_check_mark:
Alibaba Cloud OSS :heavy_check_mark:
Google BigTable/BigQuery to be added
Elasticsearch (experimental) :heavy_check_mark: :heavy_check_mark:
MongoDB (experimental) :heavy_check_mark: :heavy_check_mark:

References for emulators:

Community

Additional Information

License

Apache License 2.0

Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220209175718-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220209175718-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 b95a9e5ceb0fb1b9f6dba573939f091feb6748d458c127a681533c596ce3f81d
MD5 1e8088b8bb5746aa211763d14af79840
BLAKE2b-256 2b285305db7c4659e17b4f5f9ea6aae8f835c947744bd2406c1865ab09d0247a

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220209175718-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220209175718-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 2073e99104666ac6965492a70c0153918e4d89325f38357b0d588ca3dccc0f2c
MD5 f9d3f68a596ae76310e15ee2c44088fa
BLAKE2b-256 1c3ef00d871d6f13946ad5e4d4d6dea6173575e1b65be9ee2e17c54df4d4c8db

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220209175718-cp310-cp310-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220209175718-cp310-cp310-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 e62475a66b34791ac05e46951051a88818cfbe241fd88f373ad62cfeeb8e0e7a
MD5 5c01cc34d9bad5aec2b6bcf4a08d419f
BLAKE2b-256 54de60b37b346c2c48f4687da452c83f4178c6eacd852e9f9996547545198afa

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220209175718-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220209175718-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 e44f8c71c54610a6a5824de7df8b3f5b2cf688b1f0d4809402ccd341c117b4a3
MD5 6b2af6d46c558472c2450990e6e05a3c
BLAKE2b-256 aba0f179672b60a72a492f54005ea5ddfc9a334ccf94ffb2df2187d4c53e1744

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220209175718-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220209175718-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 b4cf7b0f83414fe9053727d1bc91cd279499004132b6b7d895ec3e0bec46a328
MD5 9c4f116a63bca1ab96fb9077384cfd83
BLAKE2b-256 efca9a554b7e3a0ea665e0b20c7360b357fa6a090650236eea93a7ea80a04e95

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220209175718-cp39-cp39-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220209175718-cp39-cp39-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 08b07f5bd1dc7cb093979967d8bc1ad719fbf97c7ff44d1bdbb7952b8ca3e7ba
MD5 b4757d8df71c954aaf8c7643e3c76b4f
BLAKE2b-256 7ef09e30abee30568aa552d662d9de0ddcd6c4fdf724f55a12475b85f5ea11d9

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220209175718-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220209175718-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 86481068b04af95676b24e356057e176791907c0111f5f8047d9f06496306894
MD5 d3bdcaf78da6a9f81cef126cb1af2743
BLAKE2b-256 db6f3c554fe9d44c727f7089888d092a54250a6c269f24b91c7a567738196611

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220209175718-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220209175718-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 aa691976eff58928e950c17d2b1d6b0dcd3669407aa8eb37783f93526a241e18
MD5 41ac79da5c898e7f657855fa9b01b88c
BLAKE2b-256 c2de5b8107b588a83cf161079508ce5eb0ac0b95c6a2aaa941ef9185238012db

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220209175718-cp38-cp38-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220209175718-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 b0ee701df9935bf325700042be8a12ebb79485f0bc03e7165d94502661c85eb5
MD5 ca31be18c902d6e5e618db0a3e2d7254
BLAKE2b-256 82ea1e0b5c13acc63c8ea03e8749a92063ed2447cfdfb8dd5bb7a6c2f650e78f

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220209175718-cp37-cp37m-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220209175718-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 c26d823231b35c54458b9432d4c77932b0f9c522b79a74c4c82e60a978f45370
MD5 f3badbff49b1a5830413324504de6fc4
BLAKE2b-256 baad048275538ed933454cbae029b64434277941ff4cffee91a899168d199519

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220209175718-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220209175718-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 a400400ba453f13a07f0e7b3d5fb1b197854c99100604256c8c53aefd8addb6f
MD5 9c9cea3b810337f01d35c7093a177347
BLAKE2b-256 1beedebaccb7100c201cde539740476de8574ec45df0fd2ae33b14cae5b8a242

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220209175718-cp37-cp37m-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220209175718-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 7ce565af1d4b0dc5e0e21140d197ea1ed26e9c4788a16cdd2b1491e1586ab815
MD5 501130cc5aca3a324af2fe2be3da71d2
BLAKE2b-256 5195ebd9f15a54dc39a328da931df9a270aa63c44295b3b1146c3620b636794f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page