Skip to main content

TensorFlow IO

Project description




TensorFlow I/O

GitHub CI PyPI License Documentation

TensorFlow I/O is a collection of file systems and file formats that are not available in TensorFlow's built-in support. A full list of supported file systems and file formats by TensorFlow I/O can be found here.

The use of tensorflow-io is straightforward with keras. Below is an example to Get Started with TensorFlow with the data processing aspect replaced by tensorflow-io:

import tensorflow as tf
import tensorflow_io as tfio

# Read the MNIST data into the IODataset.
dataset_url = "https://storage.googleapis.com/cvdf-datasets/mnist/"
d_train = tfio.IODataset.from_mnist(
    dataset_url + "train-images-idx3-ubyte.gz",
    dataset_url + "train-labels-idx1-ubyte.gz",
)

# Shuffle the elements of the dataset.
d_train = d_train.shuffle(buffer_size=1024)

# By default image data is uint8, so convert to float32 using map().
d_train = d_train.map(lambda x, y: (tf.image.convert_image_dtype(x, tf.float32), y))

# prepare batches the data just like any other tf.data.Dataset
d_train = d_train.batch(32)

# Build the model.
model = tf.keras.models.Sequential(
    [
        tf.keras.layers.Flatten(input_shape=(28, 28)),
        tf.keras.layers.Dense(512, activation=tf.nn.relu),
        tf.keras.layers.Dropout(0.2),
        tf.keras.layers.Dense(10, activation=tf.nn.softmax),
    ]
)

# Compile the model.
model.compile(
    optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"]
)

# Fit the model.
model.fit(d_train, epochs=5, steps_per_epoch=200)

In the above MNIST example, the URL's to access the dataset files are passed directly to the tfio.IODataset.from_mnist API call. This is due to the inherent support that tensorflow-io provides for HTTP/HTTPS file system, thus eliminating the need for downloading and saving datasets on a local directory.

NOTE: Since tensorflow-io is able to detect and uncompress the MNIST dataset automatically if needed, we can pass the URL's for the compressed files (gzip) to the API call as is.

Please check the official documentation for more detailed and interesting usages of the package.

Installation

Python Package

The tensorflow-io Python package can be installed with pip directly using:

$ pip install tensorflow-io

People who are a little more adventurous can also try our nightly binaries:

$ pip install tensorflow-io-nightly

Docker Images

In addition to the pip packages, the docker images can be used to quickly get started.

For stable builds:

$ docker pull tfsigio/tfio:latest
$ docker run -it --rm --name tfio-latest tfsigio/tfio:latest

For nightly builds:

$ docker pull tfsigio/tfio:nightly
$ docker run -it --rm --name tfio-nightly tfsigio/tfio:nightly

R Package

Once the tensorflow-io Python package has been successfully installed, you can install the development version of the R package from GitHub via the following:

if (!require("remotes")) install.packages("remotes")
remotes::install_github("tensorflow/io", subdir = "R-package")

TensorFlow Version Compatibility

To ensure compatibility with TensorFlow, it is recommended to install a matching version of TensorFlow I/O according to the table below. You can find the list of releases here.

TensorFlow I/O Version TensorFlow Compatibility Release Date
0.21.0 2.6.x Sep 12, 2021
0.20.0 2.6.x Aug 11, 2021
0.19.1 2.5.x Jul 25, 2021
0.19.0 2.5.x Jun 25, 2021
0.18.0 2.5.x May 13, 2021
0.17.1 2.4.x Apr 16, 2021
0.17.0 2.4.x Dec 14, 2020
0.16.0 2.3.x Oct 23, 2020
0.15.0 2.3.x Aug 03, 2020
0.14.0 2.2.x Jul 08, 2020
0.13.0 2.2.x May 10, 2020
0.12.0 2.1.x Feb 28, 2020
0.11.0 2.1.x Jan 10, 2020
0.10.0 2.0.x Dec 05, 2019
0.9.1 2.0.x Nov 15, 2019
0.9.0 2.0.x Oct 18, 2019
0.8.1 1.15.x Nov 15, 2019
0.8.0 1.15.x Oct 17, 2019
0.7.2 1.14.x Nov 15, 2019
0.7.1 1.14.x Oct 18, 2019
0.7.0 1.14.x Jul 14, 2019
0.6.0 1.13.x May 29, 2019
0.5.0 1.13.x Apr 12, 2019
0.4.0 1.13.x Mar 01, 2019
0.3.0 1.12.0 Feb 15, 2019
0.2.0 1.12.0 Jan 29, 2019
0.1.0 1.12.0 Dec 16, 2018

Performance Benchmarking

We use github-pages to document the results of API performance benchmarks. The benchmark job is triggered on every commit to master branch and facilitates tracking performance w.r.t commits.

Contributing

Tensorflow I/O is a community led open source project. As such, the project depends on public contributions, bug-fixes, and documentation. Please see:

Build Status and CI

Build Status
Linux CPU Python 2 Status
Linux CPU Python 3 Status
Linux GPU Python 2 Status
Linux GPU Python 3 Status

Because of manylinux2010 requirement, TensorFlow I/O is built with Ubuntu:16.04 + Developer Toolset 7 (GCC 7.3) on Linux. Configuration with Ubuntu 16.04 with Developer Toolset 7 is not exactly straightforward. If the system have docker installed, then the following command will automatically build manylinux2010 compatible whl package:

#!/usr/bin/env bash

ls dist/*
for f in dist/*.whl; do
  docker run -i --rm -v $PWD:/v -w /v --net=host quay.io/pypa/manylinux2010_x86_64 bash -x -e /v/tools/build/auditwheel repair --plat manylinux2010_x86_64 $f
done
sudo chown -R $(id -nu):$(id -ng) .
ls wheelhouse/*

It takes some time to build, but once complete, there will be python 3.5, 3.6, 3.7 compatible whl packages available in wheelhouse directory.

On macOS, the same command could be used. However, the script expects python in shell and will only generate a whl package that matches the version of python in shell. If you want to build a whl package for a specific python then you have to alias this version of python to python in shell. See .github/workflows/build.yml Auditwheel step for instructions how to do that.

Note the above command is also the command we use when releasing packages for Linux and macOS.

TensorFlow I/O uses both GitHub Workflows and Google CI (Kokoro) for continuous integration. GitHub Workflows is used for macOS build and test. Kokoro is used for Linux build and test. Again, because of the manylinux2010 requirement, on Linux whl packages are always built with Ubuntu 16.04 + Developer Toolset 7. Tests are done on a variatiy of systems with different python3 versions to ensure a good coverage:

Python Ubuntu 18.04 Ubuntu 20.04 macOS + osx9 Windows-2019
2.7 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: N/A
3.7 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:
3.8 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:

TensorFlow I/O has integrations with many systems and cloud vendors such as Prometheus, Apache Kafka, Apache Ignite, Google Cloud PubSub, AWS Kinesis, Microsoft Azure Storage, Alibaba Cloud OSS etc.

We tried our best to test against those systems in our continuous integration whenever possible. Some tests such as Prometheus, Kafka, and Ignite are done with live systems, meaning we install Prometheus/Kafka/Ignite on CI machine before the test is run. Some tests such as Kinesis, PubSub, and Azure Storage are done through official or non-official emulators. Offline tests are also performed whenever possible, though systems covered through offine tests may not have the same level of coverage as live systems or emulators.

Live System Emulator CI Integration Offline
Apache Kafka :heavy_check_mark: :heavy_check_mark:
Apache Ignite :heavy_check_mark: :heavy_check_mark:
Prometheus :heavy_check_mark: :heavy_check_mark:
Google PubSub :heavy_check_mark: :heavy_check_mark:
Azure Storage :heavy_check_mark: :heavy_check_mark:
AWS Kinesis :heavy_check_mark: :heavy_check_mark:
Alibaba Cloud OSS :heavy_check_mark:
Google BigTable/BigQuery to be added
Elasticsearch (experimental) :heavy_check_mark: :heavy_check_mark:
MongoDB (experimental) :heavy_check_mark: :heavy_check_mark:

References for emulators:

Community

Additional Information

License

Apache License 2.0

Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20211107074504-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20211107074504-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 afe3fa6b82d7d4e2ad82ee51515a95f2bed18414b16268e42aafc89b503f64fa
MD5 ac669f87b664978c09e9633a88655d5e
BLAKE2b-256 aa58712f66ca66886587498ae5c68a391b27699b5d1bcccd006d20005f154aa4

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20211107074504-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20211107074504-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 f3dd61dfcbd492f7916dc691ca3589eeedede986322ea82e270b03f3c06a8eb4
MD5 9550fcae8dbb458ff95637f02268792b
BLAKE2b-256 a7ebb390a3121aed041a924a7069c90f65523e9f3a58fa66875b8556cade940c

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20211107074504-cp39-cp39-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20211107074504-cp39-cp39-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 0a162f7edc0cecb1a21fd91829f2ddb6757a3129477fbb1b93f270f7f391865a
MD5 f00cf877388ee90bbfaeee0d7ad9fa67
BLAKE2b-256 35f1757a336bdc026c5d84b103a306aeeb0c9512f3ee4fd0ae0597ff9d1cb0e4

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20211107074504-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20211107074504-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 edcd20cee06acbd8f5a3940b02a758bbbbcd3a560a7ebaab7cd205b4cde7fb96
MD5 7f08559b60e3bfe86e5592a0b60de618
BLAKE2b-256 68f3211ff280aa4a75f5c3286e6398002d6de15f2d0b5cc22f0e78eb4d9f326b

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20211107074504-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20211107074504-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 2e8ba8039a2c27a3c2a8cd2d02ce434571d29267f893ad2f597d6aef47692875
MD5 576a7c42b968e722fe6047c4e5883514
BLAKE2b-256 34175750ae56119524df674f7ce5f8bd060cb5fd883089fbd934e00232125b80

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20211107074504-cp38-cp38-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20211107074504-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 0ae8614570e1e6b2989cc938d391d0db40731d404a57d5181f91fd3f26d4790e
MD5 0f8be2692769a57fba2d5ef4883176c6
BLAKE2b-256 7125a86e4deebb7cf12a2a919662498e839f758dbc6515f044b1db9cb2e3c0ca

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20211107074504-cp37-cp37m-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20211107074504-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 e1957669f702575aec3b0b622eaee4fe9926ec7b504fbb50eac956324ddd48ff
MD5 3f491587af09dec23e9e4d902ac667c2
BLAKE2b-256 511dff6c6b63da907b467b150018e93998280500ef9e31ec4825a918746aac2f

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20211107074504-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20211107074504-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 7942da7be83561ca67e7fd355e51b7deda8ce406145f0c65da0ae4b8f0659c49
MD5 fb666084cac54ddd99dc14e83c5ab2df
BLAKE2b-256 2ee2ce74664cade1819ec4230539d44415d7cebbeb3f7049575151d83d7a1439

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20211107074504-cp37-cp37m-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20211107074504-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 5d06fa9725a3762bd548f69476c0eaea755186e263d7a6acaca1b191b2bcb4c6
MD5 e8a17dbe9a2297684f45c34e01e19fd3
BLAKE2b-256 69c2a744a10b7bf9f888a16ae3b98115dca510779540299603390fadd3df502c

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20211107074504-cp36-cp36m-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20211107074504-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 3dd99e34197fe1d75a7648e0baacccb22278afcd9257b188c4f6035145b9d4b0
MD5 275dd2a5501daf31d44fff8cfa3d1ad8
BLAKE2b-256 b40dcecdfa3cd56ca21b615295eec23ebad3270cc204de14d3d01032c91fa3ea

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20211107074504-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20211107074504-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 00df45b88200a1057f1f6ce8ef3599d7aa5cd0f5ef4ac9d21d4e7fe916cc4134
MD5 bed46b29ac747915d3ca18e191bc7f66
BLAKE2b-256 243d96fcfd21198104e7a81a6e4434377e3c4b9e4d03972945ec35feb6a21bee

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20211107074504-cp36-cp36m-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.21.0.dev20211107074504-cp36-cp36m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 9b4f9842804698d50939c970d2ba600b531017d3d27fd457055ab5cef8c96a71
MD5 6c3e378e57dcef9fd3d6eafa3a66e667
BLAKE2b-256 856957c6070859648e1c73a86580ce01569c82da481d3845d1575d15a80a720d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page