Skip to main content

TensorFlow IO

Project description




TensorFlow I/O

GitHub CI PyPI License Documentation

TensorFlow I/O is a collection of file systems and file formats that are not available in TensorFlow's built-in support. A full list of supported file systems and file formats by TensorFlow I/O can be found here.

The use of tensorflow-io is straightforward with keras. Below is an example to Get Started with TensorFlow with the data processing aspect replaced by tensorflow-io:

import tensorflow as tf
import tensorflow_io as tfio

# Read the MNIST data into the IODataset.
dataset_url = "https://storage.googleapis.com/cvdf-datasets/mnist/"
d_train = tfio.IODataset.from_mnist(
    dataset_url + "train-images-idx3-ubyte.gz",
    dataset_url + "train-labels-idx1-ubyte.gz",
)

# Shuffle the elements of the dataset.
d_train = d_train.shuffle(buffer_size=1024)

# By default image data is uint8, so convert to float32 using map().
d_train = d_train.map(lambda x, y: (tf.image.convert_image_dtype(x, tf.float32), y))

# prepare batches the data just like any other tf.data.Dataset
d_train = d_train.batch(32)

# Build the model.
model = tf.keras.models.Sequential(
    [
        tf.keras.layers.Flatten(input_shape=(28, 28)),
        tf.keras.layers.Dense(512, activation=tf.nn.relu),
        tf.keras.layers.Dropout(0.2),
        tf.keras.layers.Dense(10, activation=tf.nn.softmax),
    ]
)

# Compile the model.
model.compile(
    optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"]
)

# Fit the model.
model.fit(d_train, epochs=5, steps_per_epoch=200)

In the above MNIST example, the URL's to access the dataset files are passed directly to the tfio.IODataset.from_mnist API call. This is due to the inherent support that tensorflow-io provides for HTTP/HTTPS file system, thus eliminating the need for downloading and saving datasets on a local directory.

NOTE: Since tensorflow-io is able to detect and uncompress the MNIST dataset automatically if needed, we can pass the URL's for the compressed files (gzip) to the API call as is.

Please check the official documentation for more detailed and interesting usages of the package.

Installation

Python Package

The tensorflow-io Python package can be installed with pip directly using:

$ pip install tensorflow-io

People who are a little more adventurous can also try our nightly binaries:

$ pip install tensorflow-io-nightly

To ensure you have a version of TensorFlow that is compatible with TensorFlow-IO, you can specify the tensorflow extra requirement during install:

pip install tensorflow-io[tensorflow]

Similar extras exist for the tensorflow-gpu, tensorflow-cpu and tensorflow-rocm packages.

Docker Images

In addition to the pip packages, the docker images can be used to quickly get started.

For stable builds:

$ docker pull tfsigio/tfio:latest
$ docker run -it --rm --name tfio-latest tfsigio/tfio:latest

For nightly builds:

$ docker pull tfsigio/tfio:nightly
$ docker run -it --rm --name tfio-nightly tfsigio/tfio:nightly

R Package

Once the tensorflow-io Python package has been successfully installed, you can install the development version of the R package from GitHub via the following:

if (!require("remotes")) install.packages("remotes")
remotes::install_github("tensorflow/io", subdir = "R-package")

TensorFlow Version Compatibility

To ensure compatibility with TensorFlow, it is recommended to install a matching version of TensorFlow I/O according to the table below. You can find the list of releases here.

TensorFlow I/O Version TensorFlow Compatibility Release Date
0.24.0 2.8.x Feb 04, 2022
0.23.1 2.7.x Dec 15, 2021
0.23.0 2.7.x Dec 14, 2021
0.22.0 2.7.x Nov 10, 2021
0.21.0 2.6.x Sep 12, 2021
0.20.0 2.6.x Aug 11, 2021
0.19.1 2.5.x Jul 25, 2021
0.19.0 2.5.x Jun 25, 2021
0.18.0 2.5.x May 13, 2021
0.17.1 2.4.x Apr 16, 2021
0.17.0 2.4.x Dec 14, 2020
0.16.0 2.3.x Oct 23, 2020
0.15.0 2.3.x Aug 03, 2020
0.14.0 2.2.x Jul 08, 2020
0.13.0 2.2.x May 10, 2020
0.12.0 2.1.x Feb 28, 2020
0.11.0 2.1.x Jan 10, 2020
0.10.0 2.0.x Dec 05, 2019
0.9.1 2.0.x Nov 15, 2019
0.9.0 2.0.x Oct 18, 2019
0.8.1 1.15.x Nov 15, 2019
0.8.0 1.15.x Oct 17, 2019
0.7.2 1.14.x Nov 15, 2019
0.7.1 1.14.x Oct 18, 2019
0.7.0 1.14.x Jul 14, 2019
0.6.0 1.13.x May 29, 2019
0.5.0 1.13.x Apr 12, 2019
0.4.0 1.13.x Mar 01, 2019
0.3.0 1.12.0 Feb 15, 2019
0.2.0 1.12.0 Jan 29, 2019
0.1.0 1.12.0 Dec 16, 2018

Performance Benchmarking

We use github-pages to document the results of API performance benchmarks. The benchmark job is triggered on every commit to master branch and facilitates tracking performance w.r.t commits.

Contributing

Tensorflow I/O is a community led open source project. As such, the project depends on public contributions, bug-fixes, and documentation. Please see:

Build Status and CI

Build Status
Linux CPU Python 2 Status
Linux CPU Python 3 Status
Linux GPU Python 2 Status
Linux GPU Python 3 Status

Because of manylinux2010 requirement, TensorFlow I/O is built with Ubuntu:16.04 + Developer Toolset 7 (GCC 7.3) on Linux. Configuration with Ubuntu 16.04 with Developer Toolset 7 is not exactly straightforward. If the system have docker installed, then the following command will automatically build manylinux2010 compatible whl package:

#!/usr/bin/env bash

ls dist/*
for f in dist/*.whl; do
  docker run -i --rm -v $PWD:/v -w /v --net=host quay.io/pypa/manylinux2010_x86_64 bash -x -e /v/tools/build/auditwheel repair --plat manylinux2010_x86_64 $f
done
sudo chown -R $(id -nu):$(id -ng) .
ls wheelhouse/*

It takes some time to build, but once complete, there will be python 3.5, 3.6, 3.7 compatible whl packages available in wheelhouse directory.

On macOS, the same command could be used. However, the script expects python in shell and will only generate a whl package that matches the version of python in shell. If you want to build a whl package for a specific python then you have to alias this version of python to python in shell. See .github/workflows/build.yml Auditwheel step for instructions how to do that.

Note the above command is also the command we use when releasing packages for Linux and macOS.

TensorFlow I/O uses both GitHub Workflows and Google CI (Kokoro) for continuous integration. GitHub Workflows is used for macOS build and test. Kokoro is used for Linux build and test. Again, because of the manylinux2010 requirement, on Linux whl packages are always built with Ubuntu 16.04 + Developer Toolset 7. Tests are done on a variatiy of systems with different python3 versions to ensure a good coverage:

Python Ubuntu 18.04 Ubuntu 20.04 macOS + osx9 Windows-2019
2.7 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: N/A
3.7 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:
3.8 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:

TensorFlow I/O has integrations with many systems and cloud vendors such as Prometheus, Apache Kafka, Apache Ignite, Google Cloud PubSub, AWS Kinesis, Microsoft Azure Storage, Alibaba Cloud OSS etc.

We tried our best to test against those systems in our continuous integration whenever possible. Some tests such as Prometheus, Kafka, and Ignite are done with live systems, meaning we install Prometheus/Kafka/Ignite on CI machine before the test is run. Some tests such as Kinesis, PubSub, and Azure Storage are done through official or non-official emulators. Offline tests are also performed whenever possible, though systems covered through offine tests may not have the same level of coverage as live systems or emulators.

Live System Emulator CI Integration Offline
Apache Kafka :heavy_check_mark: :heavy_check_mark:
Apache Ignite :heavy_check_mark: :heavy_check_mark:
Prometheus :heavy_check_mark: :heavy_check_mark:
Google PubSub :heavy_check_mark: :heavy_check_mark:
Azure Storage :heavy_check_mark: :heavy_check_mark:
AWS Kinesis :heavy_check_mark: :heavy_check_mark:
Alibaba Cloud OSS :heavy_check_mark:
Google BigTable/BigQuery to be added
Elasticsearch (experimental) :heavy_check_mark: :heavy_check_mark:
MongoDB (experimental) :heavy_check_mark: :heavy_check_mark:

References for emulators:

Community

Additional Information

License

Apache License 2.0

Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220307180833-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220307180833-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 c67f248a28cb062d447b9ce301ce5eb2a98706529e9beaf191da98f8f61fdf8a
MD5 17946669dc4f35fa44beaf920a61deef
BLAKE2b-256 ad576c30b73a67b0f2fe5a2d19ba0f03ccf2084e65e27d414e8e18bad322ecfc

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220307180833-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220307180833-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 71041a0ad0f2c409d3c4ce4591a9610dddbbe5d1ed67f3ce7be35227d77baf20
MD5 693a0b21c7673252947fe71d0376aa4c
BLAKE2b-256 d309cf4568f617966fdaad093c047353e4b0454fb323a3eafb87a7a62a508e65

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220307180833-cp310-cp310-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220307180833-cp310-cp310-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 81708a66357346af19cf54a0f1fb801a959e45301cf6f5d0d0bc965b3a50bd18
MD5 9f078b1a9afe57f15ecd5b46b6759c20
BLAKE2b-256 af8c15bac0fb74e7f1b7d50c0d3611b3d7ab087eea0d60adb41a6ef5196c29da

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220307180833-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220307180833-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 a8ba28b69f1b6f5e37bc038fdb3e0a674eceb92e1956a14ba98ae9c2478418f2
MD5 df13459f59a3c40bc748f69e8db4070e
BLAKE2b-256 b55ff028d3b92d4c9c5d16a0fd614b619b976d95e1e8f02fd3ebed27dafed05f

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220307180833-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220307180833-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 6a903afd958c0419f56f45730ced89d3f320aacded131df7385238db69b07738
MD5 cd338c5d3753611263ea03af5bf8b7e3
BLAKE2b-256 22fbb01348acdff64346d9b883a23e2b30b30e4165b18536e67536655e29bb1d

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220307180833-cp39-cp39-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220307180833-cp39-cp39-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 11dd266098ac90d20514cbf0e13fdf016c727d7eee0e6605334fd8376fa9ec88
MD5 123fa2931b2d16d1753608ca47aa4d8a
BLAKE2b-256 5d5577050878aca3efadd7834ebe20a0fc4620828cec2c5ea2fe5a0cae38e949

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220307180833-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220307180833-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 a43a575ee0a5553a758900b31051ffd3a90af1bb1dc57ca54e92cd69ecbf6721
MD5 f569fb8595c3c8fe8937849af5494589
BLAKE2b-256 463323af5c1dcb0a886d7ca164427c5c150e584e8a68c45f5a6335b50589b1ea

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220307180833-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220307180833-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 123aaea68f99ecea3870df1b94fd26975e9b691bf9f088a606ffd25bb792aafd
MD5 80d72bbe6acadc403a77f692ab56af2a
BLAKE2b-256 6da1e1c91e69881185d038613a270d0fd59247b8df496ba9727ab3c7ee430736

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220307180833-cp38-cp38-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220307180833-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 d80c5a0efd12e543ead2b8ff61dd3dd01979ff1a75455dbd0f7e7828380580d3
MD5 ca7c9e7cb6bdcd9fbc4ded0ebdab1066
BLAKE2b-256 38377bf4cc88d63e977595b9f1c354bfc1983cfe578f5621b4a0f070385ab61b

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220307180833-cp37-cp37m-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220307180833-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 731cf805145cab196f842a18f24c3cd88b27ffb4e6fe9bad027b81133f37ac8c
MD5 d70bcff59564de10c6784612b703f3b1
BLAKE2b-256 c505c79aa23833ae25793d24bf02afb25341006897ea0fbf11b629721212e43a

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220307180833-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220307180833-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 c29049b8ba47a508ea25da8e102ca4d07bd14e09f4541bf39389a0b4df7408fb
MD5 653c5201a29f47ea77f7e67aadc2f050
BLAKE2b-256 092807453987dfc787ef110cea97593dbde844f2622a141e4ba0494a84faa85e

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220307180833-cp37-cp37m-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220307180833-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 cd6ede7613801630b12b0995d9bb1c4d46dfa77c1cade959684e6212f541641d
MD5 45e19295776d75efd0dd774a3b8cd534
BLAKE2b-256 881d65964d5cb896abe7c082971fbc0815aeb8c14898b898e1e2b8e1eab074bc

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page