Skip to main content

TensorFlow IO

Project description




TensorFlow I/O

GitHub CI PyPI License Documentation

TensorFlow I/O is a collection of file systems and file formats that are not available in TensorFlow's built-in support. A full list of supported file systems and file formats by TensorFlow I/O can be found here.

The use of tensorflow-io is straightforward with keras. Below is an example to Get Started with TensorFlow with the data processing aspect replaced by tensorflow-io:

import tensorflow as tf
import tensorflow_io as tfio

# Read the MNIST data into the IODataset.
dataset_url = "https://storage.googleapis.com/cvdf-datasets/mnist/"
d_train = tfio.IODataset.from_mnist(
    dataset_url + "train-images-idx3-ubyte.gz",
    dataset_url + "train-labels-idx1-ubyte.gz",
)

# Shuffle the elements of the dataset.
d_train = d_train.shuffle(buffer_size=1024)

# By default image data is uint8, so convert to float32 using map().
d_train = d_train.map(lambda x, y: (tf.image.convert_image_dtype(x, tf.float32), y))

# prepare batches the data just like any other tf.data.Dataset
d_train = d_train.batch(32)

# Build the model.
model = tf.keras.models.Sequential(
    [
        tf.keras.layers.Flatten(input_shape=(28, 28)),
        tf.keras.layers.Dense(512, activation=tf.nn.relu),
        tf.keras.layers.Dropout(0.2),
        tf.keras.layers.Dense(10, activation=tf.nn.softmax),
    ]
)

# Compile the model.
model.compile(
    optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"]
)

# Fit the model.
model.fit(d_train, epochs=5, steps_per_epoch=200)

In the above MNIST example, the URL's to access the dataset files are passed directly to the tfio.IODataset.from_mnist API call. This is due to the inherent support that tensorflow-io provides for HTTP/HTTPS file system, thus eliminating the need for downloading and saving datasets on a local directory.

NOTE: Since tensorflow-io is able to detect and uncompress the MNIST dataset automatically if needed, we can pass the URL's for the compressed files (gzip) to the API call as is.

Please check the official documentation for more detailed and interesting usages of the package.

Installation

Python Package

The tensorflow-io Python package can be installed with pip directly using:

$ pip install tensorflow-io

People who are a little more adventurous can also try our nightly binaries:

$ pip install tensorflow-io-nightly

To ensure you have a version of TensorFlow that is compatible with TensorFlow-IO, you can specify the tensorflow extra requirement during install:

pip install tensorflow-io[tensorflow]

Similar extras exist for the tensorflow-gpu, tensorflow-cpu and tensorflow-rocm packages.

Docker Images

In addition to the pip packages, the docker images can be used to quickly get started.

For stable builds:

$ docker pull tfsigio/tfio:latest
$ docker run -it --rm --name tfio-latest tfsigio/tfio:latest

For nightly builds:

$ docker pull tfsigio/tfio:nightly
$ docker run -it --rm --name tfio-nightly tfsigio/tfio:nightly

R Package

Once the tensorflow-io Python package has been successfully installed, you can install the development version of the R package from GitHub via the following:

if (!require("remotes")) install.packages("remotes")
remotes::install_github("tensorflow/io", subdir = "R-package")

TensorFlow Version Compatibility

To ensure compatibility with TensorFlow, it is recommended to install a matching version of TensorFlow I/O according to the table below. You can find the list of releases here.

TensorFlow I/O Version TensorFlow Compatibility Release Date
0.24.0 2.8.x Feb 04, 2022
0.23.1 2.7.x Dec 15, 2021
0.23.0 2.7.x Dec 14, 2021
0.22.0 2.7.x Nov 10, 2021
0.21.0 2.6.x Sep 12, 2021
0.20.0 2.6.x Aug 11, 2021
0.19.1 2.5.x Jul 25, 2021
0.19.0 2.5.x Jun 25, 2021
0.18.0 2.5.x May 13, 2021
0.17.1 2.4.x Apr 16, 2021
0.17.0 2.4.x Dec 14, 2020
0.16.0 2.3.x Oct 23, 2020
0.15.0 2.3.x Aug 03, 2020
0.14.0 2.2.x Jul 08, 2020
0.13.0 2.2.x May 10, 2020
0.12.0 2.1.x Feb 28, 2020
0.11.0 2.1.x Jan 10, 2020
0.10.0 2.0.x Dec 05, 2019
0.9.1 2.0.x Nov 15, 2019
0.9.0 2.0.x Oct 18, 2019
0.8.1 1.15.x Nov 15, 2019
0.8.0 1.15.x Oct 17, 2019
0.7.2 1.14.x Nov 15, 2019
0.7.1 1.14.x Oct 18, 2019
0.7.0 1.14.x Jul 14, 2019
0.6.0 1.13.x May 29, 2019
0.5.0 1.13.x Apr 12, 2019
0.4.0 1.13.x Mar 01, 2019
0.3.0 1.12.0 Feb 15, 2019
0.2.0 1.12.0 Jan 29, 2019
0.1.0 1.12.0 Dec 16, 2018

Performance Benchmarking

We use github-pages to document the results of API performance benchmarks. The benchmark job is triggered on every commit to master branch and facilitates tracking performance w.r.t commits.

Contributing

Tensorflow I/O is a community led open source project. As such, the project depends on public contributions, bug-fixes, and documentation. Please see:

Build Status and CI

Build Status
Linux CPU Python 2 Status
Linux CPU Python 3 Status
Linux GPU Python 2 Status
Linux GPU Python 3 Status

Because of manylinux2010 requirement, TensorFlow I/O is built with Ubuntu:16.04 + Developer Toolset 7 (GCC 7.3) on Linux. Configuration with Ubuntu 16.04 with Developer Toolset 7 is not exactly straightforward. If the system have docker installed, then the following command will automatically build manylinux2010 compatible whl package:

#!/usr/bin/env bash

ls dist/*
for f in dist/*.whl; do
  docker run -i --rm -v $PWD:/v -w /v --net=host quay.io/pypa/manylinux2010_x86_64 bash -x -e /v/tools/build/auditwheel repair --plat manylinux2010_x86_64 $f
done
sudo chown -R $(id -nu):$(id -ng) .
ls wheelhouse/*

It takes some time to build, but once complete, there will be python 3.5, 3.6, 3.7 compatible whl packages available in wheelhouse directory.

On macOS, the same command could be used. However, the script expects python in shell and will only generate a whl package that matches the version of python in shell. If you want to build a whl package for a specific python then you have to alias this version of python to python in shell. See .github/workflows/build.yml Auditwheel step for instructions how to do that.

Note the above command is also the command we use when releasing packages for Linux and macOS.

TensorFlow I/O uses both GitHub Workflows and Google CI (Kokoro) for continuous integration. GitHub Workflows is used for macOS build and test. Kokoro is used for Linux build and test. Again, because of the manylinux2010 requirement, on Linux whl packages are always built with Ubuntu 16.04 + Developer Toolset 7. Tests are done on a variatiy of systems with different python3 versions to ensure a good coverage:

Python Ubuntu 18.04 Ubuntu 20.04 macOS + osx9 Windows-2019
2.7 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: N/A
3.7 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:
3.8 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:

TensorFlow I/O has integrations with many systems and cloud vendors such as Prometheus, Apache Kafka, Apache Ignite, Google Cloud PubSub, AWS Kinesis, Microsoft Azure Storage, Alibaba Cloud OSS etc.

We tried our best to test against those systems in our continuous integration whenever possible. Some tests such as Prometheus, Kafka, and Ignite are done with live systems, meaning we install Prometheus/Kafka/Ignite on CI machine before the test is run. Some tests such as Kinesis, PubSub, and Azure Storage are done through official or non-official emulators. Offline tests are also performed whenever possible, though systems covered through offine tests may not have the same level of coverage as live systems or emulators.

Live System Emulator CI Integration Offline
Apache Kafka :heavy_check_mark: :heavy_check_mark:
Apache Ignite :heavy_check_mark: :heavy_check_mark:
Prometheus :heavy_check_mark: :heavy_check_mark:
Google PubSub :heavy_check_mark: :heavy_check_mark:
Azure Storage :heavy_check_mark: :heavy_check_mark:
AWS Kinesis :heavy_check_mark: :heavy_check_mark:
Alibaba Cloud OSS :heavy_check_mark:
Google BigTable/BigQuery to be added
Elasticsearch (experimental) :heavy_check_mark: :heavy_check_mark:
MongoDB (experimental) :heavy_check_mark: :heavy_check_mark:

References for emulators:

Community

Additional Information

License

Apache License 2.0

Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220215061100-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220215061100-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 963e300103b31a35de590a1f920f7cd4dfedd58f2571b3ff9087c96b5e402eb1
MD5 b1de2f64e7ef0e17c8846d21ec381b05
BLAKE2b-256 96a9f00972c96e00044e98c9857c77e235b3362a56e223b55b6bee4752b30134

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220215061100-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220215061100-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 85ef5d390e7cba2750d036a35b7d17d946fbb6915ea2f22f5621c20f76d31b83
MD5 5989bd6cacc0d4e650709ccfb356fa8d
BLAKE2b-256 c1c7d8eca0fc533db03a89217054c76b58078b6005e9b0be3a5b787cc66255bd

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220215061100-cp310-cp310-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220215061100-cp310-cp310-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 61c27b3029f191f487d885fb150cd95c3b4568aec8e4698e33944123486cc2ae
MD5 ba7b3c166bb854519b0b3b8bddb4e823
BLAKE2b-256 69683398f9b94cf5fe2b88a1fb4dadf0a081c4c00b456190838a3041c23e0ca3

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220215061100-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220215061100-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 36b9ca6ab72c96d0faff5e59911822e754e44a8e9b36081870f379c900993da1
MD5 06b2ed63e013b74296590cf99497b368
BLAKE2b-256 b6ac807d1dd2b4c3f507aed433f9488656420f40de8cccb863bf614f75dbd6ba

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220215061100-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220215061100-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 93da42770b95a3c6c9fcc915e0c3b7ed38171c92035413295dfe5c7dc8bb524c
MD5 bfa33a73cff60c476e6c7bf9bfba4aac
BLAKE2b-256 00ecbf2c48a29c7ebd0d4cb1abb59e8813588fe5fb467f68f7af872f86c812e1

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220215061100-cp39-cp39-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220215061100-cp39-cp39-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 0e88b753af6b6dcbf957af83eed89302d8e031a32da51cb2738615f5bd23a8f2
MD5 14c043f08b601bbfb92a737d487b7750
BLAKE2b-256 9441e967e0ff0c7e9f79f2e0f855af5feb886fe216da1586d08d6e658024603e

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220215061100-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220215061100-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 998641d9fe013f4f81b67f635bfdc863383599ce5d60fce72a8ab7c6cc47a9cd
MD5 d975b91e438584c653dd1c6c688b3e00
BLAKE2b-256 eab65704fd40aaeeb6ba8e95f7300ba2cc3ea313f382ac2ea417c022fd1b3b1c

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220215061100-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220215061100-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 1e7bd563fcbf36a35a3cc1793fdbd7c59f1b143a4e50f614e252e33e2d10023e
MD5 61e3eaea9689e4e7205f5d2a3715a429
BLAKE2b-256 a00419d05a770898cace41ec49b7a447fe86e5dd8e6aac41a10979b8785ceb2b

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220215061100-cp38-cp38-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220215061100-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 aab53465ec7f81fbc8b8d481b19412e408792da07010aa7db9f1a30bc97dc35c
MD5 a2cd6aac4d5d4e944e2bfbf77c42a8bc
BLAKE2b-256 798692b22f59ee95e0f623cc51d04b0d89b827e6a5d89b72f7128eac2316d5ec

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220215061100-cp37-cp37m-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220215061100-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 4f2573bab1b5f479a82e44571e1804f7aabe937f6dbc87f91e7cc56bbcd0d777
MD5 39ef69c79f8ce2d9234f7eb1c52c918d
BLAKE2b-256 5af57f3ffb9c404885941ba1692b180863b4ba41a8d326482b730bd58bba2813

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220215061100-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220215061100-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 f3559ca694d4a26cb289697a3cf270c391c47e714fa16d60808ef7bf16591a6a
MD5 667ad7b61988def0cc0f8431eeb49f9f
BLAKE2b-256 ed1ee737a34b068f4a013065bf7f59fe142421ffa5139ea82704643d4c0ccb45

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220215061100-cp37-cp37m-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220215061100-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 30e2a78e0e588f634952b16b20e3acec79ddbfee2e49a5219f3ea0f3f3e82dd7
MD5 82fde381338bc081520d7f90fd4fd76b
BLAKE2b-256 91792e8ff8714c17ca028eb148c68fc913a652d007493afb014b49aa94d0a4dc

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page