Skip to main content

TensorFlow IO

Project description




TensorFlow I/O

GitHub CI PyPI License Documentation

TensorFlow I/O is a collection of file systems and file formats that are not available in TensorFlow's built-in support. A full list of supported file systems and file formats by TensorFlow I/O can be found here.

The use of tensorflow-io is straightforward with keras. Below is an example to Get Started with TensorFlow with the data processing aspect replaced by tensorflow-io:

import tensorflow as tf
import tensorflow_io as tfio

# Read the MNIST data into the IODataset.
dataset_url = "https://storage.googleapis.com/cvdf-datasets/mnist/"
d_train = tfio.IODataset.from_mnist(
    dataset_url + "train-images-idx3-ubyte.gz",
    dataset_url + "train-labels-idx1-ubyte.gz",
)

# Shuffle the elements of the dataset.
d_train = d_train.shuffle(buffer_size=1024)

# By default image data is uint8, so convert to float32 using map().
d_train = d_train.map(lambda x, y: (tf.image.convert_image_dtype(x, tf.float32), y))

# prepare batches the data just like any other tf.data.Dataset
d_train = d_train.batch(32)

# Build the model.
model = tf.keras.models.Sequential(
    [
        tf.keras.layers.Flatten(input_shape=(28, 28)),
        tf.keras.layers.Dense(512, activation=tf.nn.relu),
        tf.keras.layers.Dropout(0.2),
        tf.keras.layers.Dense(10, activation=tf.nn.softmax),
    ]
)

# Compile the model.
model.compile(
    optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"]
)

# Fit the model.
model.fit(d_train, epochs=5, steps_per_epoch=200)

In the above MNIST example, the URL's to access the dataset files are passed directly to the tfio.IODataset.from_mnist API call. This is due to the inherent support that tensorflow-io provides for HTTP/HTTPS file system, thus eliminating the need for downloading and saving datasets on a local directory.

NOTE: Since tensorflow-io is able to detect and uncompress the MNIST dataset automatically if needed, we can pass the URL's for the compressed files (gzip) to the API call as is.

Please check the official documentation for more detailed and interesting usages of the package.

Installation

Python Package

The tensorflow-io Python package can be installed with pip directly using:

$ pip install tensorflow-io

People who are a little more adventurous can also try our nightly binaries:

$ pip install tensorflow-io-nightly

To ensure you have a version of TensorFlow that is compatible with TensorFlow-IO, you can specify the tensorflow extra requirement during install:

pip install tensorflow-io[tensorflow]

Similar extras exist for the tensorflow-gpu, tensorflow-cpu and tensorflow-rocm packages.

Docker Images

In addition to the pip packages, the docker images can be used to quickly get started.

For stable builds:

$ docker pull tfsigio/tfio:latest
$ docker run -it --rm --name tfio-latest tfsigio/tfio:latest

For nightly builds:

$ docker pull tfsigio/tfio:nightly
$ docker run -it --rm --name tfio-nightly tfsigio/tfio:nightly

R Package

Once the tensorflow-io Python package has been successfully installed, you can install the development version of the R package from GitHub via the following:

if (!require("remotes")) install.packages("remotes")
remotes::install_github("tensorflow/io", subdir = "R-package")

TensorFlow Version Compatibility

To ensure compatibility with TensorFlow, it is recommended to install a matching version of TensorFlow I/O according to the table below. You can find the list of releases here.

TensorFlow I/O Version TensorFlow Compatibility Release Date
0.24.0 2.8.x Feb 04, 2022
0.23.1 2.7.x Dec 15, 2021
0.23.0 2.7.x Dec 14, 2021
0.22.0 2.7.x Nov 10, 2021
0.21.0 2.6.x Sep 12, 2021
0.20.0 2.6.x Aug 11, 2021
0.19.1 2.5.x Jul 25, 2021
0.19.0 2.5.x Jun 25, 2021
0.18.0 2.5.x May 13, 2021
0.17.1 2.4.x Apr 16, 2021
0.17.0 2.4.x Dec 14, 2020
0.16.0 2.3.x Oct 23, 2020
0.15.0 2.3.x Aug 03, 2020
0.14.0 2.2.x Jul 08, 2020
0.13.0 2.2.x May 10, 2020
0.12.0 2.1.x Feb 28, 2020
0.11.0 2.1.x Jan 10, 2020
0.10.0 2.0.x Dec 05, 2019
0.9.1 2.0.x Nov 15, 2019
0.9.0 2.0.x Oct 18, 2019
0.8.1 1.15.x Nov 15, 2019
0.8.0 1.15.x Oct 17, 2019
0.7.2 1.14.x Nov 15, 2019
0.7.1 1.14.x Oct 18, 2019
0.7.0 1.14.x Jul 14, 2019
0.6.0 1.13.x May 29, 2019
0.5.0 1.13.x Apr 12, 2019
0.4.0 1.13.x Mar 01, 2019
0.3.0 1.12.0 Feb 15, 2019
0.2.0 1.12.0 Jan 29, 2019
0.1.0 1.12.0 Dec 16, 2018

Performance Benchmarking

We use github-pages to document the results of API performance benchmarks. The benchmark job is triggered on every commit to master branch and facilitates tracking performance w.r.t commits.

Contributing

Tensorflow I/O is a community led open source project. As such, the project depends on public contributions, bug-fixes, and documentation. Please see:

Build Status and CI

Build Status
Linux CPU Python 2 Status
Linux CPU Python 3 Status
Linux GPU Python 2 Status
Linux GPU Python 3 Status

Because of manylinux2010 requirement, TensorFlow I/O is built with Ubuntu:16.04 + Developer Toolset 7 (GCC 7.3) on Linux. Configuration with Ubuntu 16.04 with Developer Toolset 7 is not exactly straightforward. If the system have docker installed, then the following command will automatically build manylinux2010 compatible whl package:

#!/usr/bin/env bash

ls dist/*
for f in dist/*.whl; do
  docker run -i --rm -v $PWD:/v -w /v --net=host quay.io/pypa/manylinux2010_x86_64 bash -x -e /v/tools/build/auditwheel repair --plat manylinux2010_x86_64 $f
done
sudo chown -R $(id -nu):$(id -ng) .
ls wheelhouse/*

It takes some time to build, but once complete, there will be python 3.5, 3.6, 3.7 compatible whl packages available in wheelhouse directory.

On macOS, the same command could be used. However, the script expects python in shell and will only generate a whl package that matches the version of python in shell. If you want to build a whl package for a specific python then you have to alias this version of python to python in shell. See .github/workflows/build.yml Auditwheel step for instructions how to do that.

Note the above command is also the command we use when releasing packages for Linux and macOS.

TensorFlow I/O uses both GitHub Workflows and Google CI (Kokoro) for continuous integration. GitHub Workflows is used for macOS build and test. Kokoro is used for Linux build and test. Again, because of the manylinux2010 requirement, on Linux whl packages are always built with Ubuntu 16.04 + Developer Toolset 7. Tests are done on a variatiy of systems with different python3 versions to ensure a good coverage:

Python Ubuntu 18.04 Ubuntu 20.04 macOS + osx9 Windows-2019
2.7 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: N/A
3.7 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:
3.8 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:

TensorFlow I/O has integrations with many systems and cloud vendors such as Prometheus, Apache Kafka, Apache Ignite, Google Cloud PubSub, AWS Kinesis, Microsoft Azure Storage, Alibaba Cloud OSS etc.

We tried our best to test against those systems in our continuous integration whenever possible. Some tests such as Prometheus, Kafka, and Ignite are done with live systems, meaning we install Prometheus/Kafka/Ignite on CI machine before the test is run. Some tests such as Kinesis, PubSub, and Azure Storage are done through official or non-official emulators. Offline tests are also performed whenever possible, though systems covered through offine tests may not have the same level of coverage as live systems or emulators.

Live System Emulator CI Integration Offline
Apache Kafka :heavy_check_mark: :heavy_check_mark:
Apache Ignite :heavy_check_mark: :heavy_check_mark:
Prometheus :heavy_check_mark: :heavy_check_mark:
Google PubSub :heavy_check_mark: :heavy_check_mark:
Azure Storage :heavy_check_mark: :heavy_check_mark:
AWS Kinesis :heavy_check_mark: :heavy_check_mark:
Alibaba Cloud OSS :heavy_check_mark:
Google BigTable/BigQuery to be added
Elasticsearch (experimental) :heavy_check_mark: :heavy_check_mark:
MongoDB (experimental) :heavy_check_mark: :heavy_check_mark:

References for emulators:

Community

Additional Information

License

Apache License 2.0

Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220212002203-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220212002203-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 ec26ec18a71cbc2cd92791aee723f33ea792c68e54bb57a5a1ee8227b0652260
MD5 ea413e7e98fbb83cb0af9959b904f0fd
BLAKE2b-256 50d367f3e6635e5101f0caf8bc5bdfc55ec4ff0b4ed99aa883ec82c5a3db2edb

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220212002203-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220212002203-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 996336949a01831a678906157572292527b4bb1c49f87e9d0b30ec4d131fd43a
MD5 a18ed6eff0e14d8645f5edad20fc1ff3
BLAKE2b-256 6f852ab392f1e141b567bed5d2a434d26580ea5a6a87d127fb7196e2769c847b

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220212002203-cp310-cp310-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220212002203-cp310-cp310-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 c2ccee023158dea4133b31755557c2d88e37409160631887ebecbe9277139b24
MD5 8eb477324b5cf9e54ec8f81a3822a1e3
BLAKE2b-256 1c05817ae2341e542a2c7df0fbfc3265425fda63b00df62b9805371745b414d5

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220212002203-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220212002203-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 5d05c85bc4f64051f6b1dd58bfdba65ca5d529a87ad85ac58167ac0910979840
MD5 c0a33d19524d2a8024146eb08806c8f7
BLAKE2b-256 bf7af0dfd89d740dbb45dbb9671ce17fcba201f1a84075b84adee0f7754efea1

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220212002203-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220212002203-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 d759e2ca38ed5c2d4388cb62410d7ef94d0a7095de9cf2d37c7f6c87cdbcb795
MD5 e4a1e7446642b9ea5afce113924139d1
BLAKE2b-256 89adc95ad07c687a833db5dd71964b31aa5abebbbfee86b04048d4afcf9959f4

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220212002203-cp39-cp39-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220212002203-cp39-cp39-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 81c43114fd4ef7b3dd4d0d52658b97d70e846241bd27d1f838e6fb12898d2680
MD5 bd01a1a1878bbdbdaa6fa5c61306a950
BLAKE2b-256 5760cac63d21e598b4e12516dba65094cc968de08498012f216151079c1a9f8e

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220212002203-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220212002203-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 770295fab495db2979ee8f98a99927553693a53aa3d06d86023139a2eefe27f6
MD5 3557fec0aaaa51a4d9d27cb61bd9fd3f
BLAKE2b-256 707d6714fa701479953399768208fa7f77fabd5e9060ea7eee7f78da18fd4f8c

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220212002203-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220212002203-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 7a8409fe32879aba79a0705b696cf4d1c8e96881aea630068cb87c0a19e89650
MD5 d658a1758883d12432e5b7925c1035c8
BLAKE2b-256 a786e9eda007c7173be1df290e45196e4750cdf6aa34715179eea95683e7fe0d

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220212002203-cp38-cp38-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220212002203-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 e62d50c374da0c3caf931eaeee8610f343fe4fbfbda171be0d81d540ee713d09
MD5 480012201f08eb50f4422960bafccc30
BLAKE2b-256 7cf0df47a92e4b94f333a25e9a7da768b5b0dc04489aed5246e95505b823f4d2

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220212002203-cp37-cp37m-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220212002203-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 871b91a4e429555259492554323c9e587823b4bf1194a968095fa9a80d70556d
MD5 b2ada1dad3f101280d970654e4f03a17
BLAKE2b-256 24c481fb9eea9707236d5a58c1f68cc70ebb4f443375391f337d1be7f037e633

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220212002203-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220212002203-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 a5eb94a2b434b864ab89ff33446c48f68cbf4dc9d59a4c23c91280eef47759fe
MD5 3fb0465442805dfdbd88d2d80f515d10
BLAKE2b-256 009bce628e69659a40090b22b24e88eaf4e26932e42351285a9cba3ad9e56a83

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220212002203-cp37-cp37m-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.24.0.dev20220212002203-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 649f85dbed4cacfbbbbac5e396174401c29771fb89373e3c5415775e38fb4d41
MD5 840c4ca62aa9392e8fcdc998cc8609ef
BLAKE2b-256 1d9451317101231509ea7b228c0603a90cc49ca789d94732228344a8af3fc1c6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page