Skip to main content

TensorFlow IO

Project description




TensorFlow I/O

GitHub CI PyPI License Documentation

TensorFlow I/O is a collection of file systems and file formats that are not available in TensorFlow's built-in support. A full list of supported file systems and file formats by TensorFlow I/O can be found here.

The use of tensorflow-io is straightforward with keras. Below is an example to Get Started with TensorFlow with the data processing aspect replaced by tensorflow-io:

import tensorflow as tf
import tensorflow_io as tfio

# Read the MNIST data into the IODataset.
dataset_url = "https://storage.googleapis.com/cvdf-datasets/mnist/"
d_train = tfio.IODataset.from_mnist(
    dataset_url + "train-images-idx3-ubyte.gz",
    dataset_url + "train-labels-idx1-ubyte.gz",
)

# Shuffle the elements of the dataset.
d_train = d_train.shuffle(buffer_size=1024)

# By default image data is uint8, so convert to float32 using map().
d_train = d_train.map(lambda x, y: (tf.image.convert_image_dtype(x, tf.float32), y))

# prepare batches the data just like any other tf.data.Dataset
d_train = d_train.batch(32)

# Build the model.
model = tf.keras.models.Sequential(
    [
        tf.keras.layers.Flatten(input_shape=(28, 28)),
        tf.keras.layers.Dense(512, activation=tf.nn.relu),
        tf.keras.layers.Dropout(0.2),
        tf.keras.layers.Dense(10, activation=tf.nn.softmax),
    ]
)

# Compile the model.
model.compile(
    optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"]
)

# Fit the model.
model.fit(d_train, epochs=5, steps_per_epoch=200)

In the above MNIST example, the URL's to access the dataset files are passed directly to the tfio.IODataset.from_mnist API call. This is due to the inherent support that tensorflow-io provides for HTTP/HTTPS file system, thus eliminating the need for downloading and saving datasets on a local directory.

NOTE: Since tensorflow-io is able to detect and uncompress the MNIST dataset automatically if needed, we can pass the URL's for the compressed files (gzip) to the API call as is.

Please check the official documentation for more detailed and interesting usages of the package.

Installation

Python Package

The tensorflow-io Python package can be installed with pip directly using:

$ pip install tensorflow-io

People who are a little more adventurous can also try our nightly binaries:

$ pip install tensorflow-io-nightly

Docker Images

In addition to the pip packages, the docker images can be used to quickly get started.

For stable builds:

$ docker pull tfsigio/tfio:latest
$ docker run -it --rm --name tfio-latest tfsigio/tfio:latest

For nightly builds:

$ docker pull tfsigio/tfio:nightly
$ docker run -it --rm --name tfio-nightly tfsigio/tfio:nightly

R Package

Once the tensorflow-io Python package has been successfully installed, you can install the development version of the R package from GitHub via the following:

if (!require("remotes")) install.packages("remotes")
remotes::install_github("tensorflow/io", subdir = "R-package")

TensorFlow Version Compatibility

To ensure compatibility with TensorFlow, it is recommended to install a matching version of TensorFlow I/O according to the table below. You can find the list of releases here.

TensorFlow I/O Version TensorFlow Compatibility Release Date
0.20.0 2.6.x Aug 11, 2021
0.19.1 2.5.x Jul 25, 2021
0.19.0 2.5.x Jun 25, 2021
0.18.0 2.5.x May 13, 2021
0.17.1 2.4.x Apr 16, 2021
0.17.0 2.4.x Dec 14, 2020
0.16.0 2.3.x Oct 23, 2020
0.15.0 2.3.x Aug 03, 2020
0.14.0 2.2.x Jul 08, 2020
0.13.0 2.2.x May 10, 2020
0.12.0 2.1.x Feb 28, 2020
0.11.0 2.1.x Jan 10, 2020
0.10.0 2.0.x Dec 05, 2019
0.9.1 2.0.x Nov 15, 2019
0.9.0 2.0.x Oct 18, 2019
0.8.1 1.15.x Nov 15, 2019
0.8.0 1.15.x Oct 17, 2019
0.7.2 1.14.x Nov 15, 2019
0.7.1 1.14.x Oct 18, 2019
0.7.0 1.14.x Jul 14, 2019
0.6.0 1.13.x May 29, 2019
0.5.0 1.13.x Apr 12, 2019
0.4.0 1.13.x Mar 01, 2019
0.3.0 1.12.0 Feb 15, 2019
0.2.0 1.12.0 Jan 29, 2019
0.1.0 1.12.0 Dec 16, 2018

Performance Benchmarking

We use github-pages to document the results of API performance benchmarks. The benchmark job is triggered on every commit to master branch and facilitates tracking performance w.r.t commits.

Contributing

Tensorflow I/O is a community led open source project. As such, the project depends on public contributions, bug-fixes, and documentation. Please see:

Build Status and CI

Build Status
Linux CPU Python 2 Status
Linux CPU Python 3 Status
Linux GPU Python 2 Status
Linux GPU Python 3 Status

Because of manylinux2010 requirement, TensorFlow I/O is built with Ubuntu:16.04 + Developer Toolset 7 (GCC 7.3) on Linux. Configuration with Ubuntu 16.04 with Developer Toolset 7 is not exactly straightforward. If the system have docker installed, then the following command will automatically build manylinux2010 compatible whl package:

#!/usr/bin/env bash

ls dist/*
for f in dist/*.whl; do
  docker run -i --rm -v $PWD:/v -w /v --net=host quay.io/pypa/manylinux2010_x86_64 bash -x -e /v/tools/build/auditwheel repair --plat manylinux2010_x86_64 $f
done
sudo chown -R $(id -nu):$(id -ng) .
ls wheelhouse/*

It takes some time to build, but once complete, there will be python 3.5, 3.6, 3.7 compatible whl packages available in wheelhouse directory.

On macOS, the same command could be used. However, the script expects python in shell and will only generate a whl package that matches the version of python in shell. If you want to build a whl package for a specific python then you have to alias this version of python to python in shell. See .github/workflows/build.yml Auditwheel step for instructions how to do that.

Note the above command is also the command we use when releasing packages for Linux and macOS.

TensorFlow I/O uses both GitHub Workflows and Google CI (Kokoro) for continuous integration. GitHub Workflows is used for macOS build and test. Kokoro is used for Linux build and test. Again, because of the manylinux2010 requirement, on Linux whl packages are always built with Ubuntu 16.04 + Developer Toolset 7. Tests are done on a variatiy of systems with different python3 versions to ensure a good coverage:

Python Ubuntu 18.04 Ubuntu 20.04 macOS + osx9 Windows-2019
2.7 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: N/A
3.7 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:
3.8 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:

TensorFlow I/O has integrations with many systems and cloud vendors such as Prometheus, Apache Kafka, Apache Ignite, Google Cloud PubSub, AWS Kinesis, Microsoft Azure Storage, Alibaba Cloud OSS etc.

We tried our best to test against those systems in our continuous integration whenever possible. Some tests such as Prometheus, Kafka, and Ignite are done with live systems, meaning we install Prometheus/Kafka/Ignite on CI machine before the test is run. Some tests such as Kinesis, PubSub, and Azure Storage are done through official or non-official emulators. Offline tests are also performed whenever possible, though systems covered through offine tests may not have the same level of coverage as live systems or emulators.

Live System Emulator CI Integration Offline
Apache Kafka :heavy_check_mark: :heavy_check_mark:
Apache Ignite :heavy_check_mark: :heavy_check_mark:
Prometheus :heavy_check_mark: :heavy_check_mark:
Google PubSub :heavy_check_mark: :heavy_check_mark:
Azure Storage :heavy_check_mark: :heavy_check_mark:
AWS Kinesis :heavy_check_mark: :heavy_check_mark:
Alibaba Cloud OSS :heavy_check_mark:
Google BigTable/BigQuery to be added
Elasticsearch (experimental) :heavy_check_mark: :heavy_check_mark:
MongoDB (experimental) :heavy_check_mark: :heavy_check_mark:

References for emulators:

Community

Additional Information

License

Apache License 2.0

Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210813191543-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210813191543-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 c8031ecbb33bb6a1b432619b1be6f6f63e19a8ae5ad6f66012e780442ee39d99
MD5 6e6716469ba69d8fda4438bdc84b0929
BLAKE2b-256 c4a80976b7155d84612af8607e6ef01a46761b949c4603f4182a39f4732b339a

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210813191543-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210813191543-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 0cac4d9094c4e929095a5cac51cc8f26f1a6ffdd936ed3d362bcb1e1f533bfd5
MD5 9398a300ce9007fc5f038f53c267251b
BLAKE2b-256 1db630afda7be04cc0e1db5d361b9332c02d25cf845f713ddd168a450c3384f8

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210813191543-cp39-cp39-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210813191543-cp39-cp39-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 eee98fde72e80f0204ccf60a358c63fb345d3168829c7a277c0235e9574c28ec
MD5 993c52bf5d47b486d96d13742d8aea46
BLAKE2b-256 92f5048e12c39696bf877e7c37b2f0070fcf744b4ba24c965a0ff77ef0c1cfe3

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210813191543-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210813191543-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 9642ef66acadc9151e4e859851370640d36dbe8439035abf6cc2dfd99e539f25
MD5 8f2659d77c06bcea56aeb6aa190608f7
BLAKE2b-256 f3a43415c9b259aa1725c73614bbdff090d9431e7e1dcc055ef4ebc1b96ccd18

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210813191543-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210813191543-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 4afc42cedcd620fd8a7c0a843b9ec0487b7b62ad202da49da61af44ce2f7d6ea
MD5 045518ea489b7209e59117eaed54f1bb
BLAKE2b-256 d00efc82c7081174bee49b69f8bd6253239ca8534769b7a958686b414c88a0a7

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210813191543-cp38-cp38-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210813191543-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 4fcfe52a137f77543dd8e777ce4ab92120b53ba9a04f855d88f73399fde2742b
MD5 f9fcacb41d8a350dc03ef7e4d0913b9d
BLAKE2b-256 10447a42f2b6dbb73e13057348854abc29dcaa1ddd35a39f09107ad7ff77de6e

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210813191543-cp37-cp37m-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210813191543-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 bd1dff927dda248dced7a96e9d35ccddab431ea4bfa44ff134aa268bd4c8d9c0
MD5 dd2c24c87dbc994dc87d39cc05c36bad
BLAKE2b-256 d44a7215ff263b64514b573cc876c2cb1875ccd4e6a56566d3ee1286d66bddac

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210813191543-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210813191543-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 121d3123c10f85dda010591dd3ddad3a031a1af28fe3581e1238d78fcdbff8b3
MD5 412480f9422f7bf43f0652127b0b8525
BLAKE2b-256 6a426b9e295608c5666c86db739ff4a55cb51dd82a0bf977efba732fc72f2cea

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210813191543-cp37-cp37m-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210813191543-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 0a5b74f22607db4c49a45de25c7372aac32e0af3a96b5909cfd3ca1002647903
MD5 e9675ea24af18a86628b747bfa2389c3
BLAKE2b-256 8e3ad45641d2575a30a03f216906381cbc00a628e89134769aa241cc8097ece1

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210813191543-cp36-cp36m-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210813191543-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 26d9838006bf6e7d68d73948be26aee756985ee71f50f5263c538e8ca876a84e
MD5 93e528888ad69614ccaf470059dd1986
BLAKE2b-256 7a5e9f847c8c2393d6c25df68c032c847b5028b5b7b7a433d4ea00e3378539c1

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210813191543-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210813191543-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 5d81808e1ff7cb80cec6c40d8bbee0d292c9fe321041d231e7c80aacb0131521
MD5 4e3a4408530e31fda1bdcc64758d82ac
BLAKE2b-256 5e5c14a26cc9da78a6b71d579d34d4156d89e4ac9caec3f5a45bae9356dbb9bd

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210813191543-cp36-cp36m-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.20.0.dev20210813191543-cp36-cp36m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 9848aba2de3392def7eaddfc2d461f26c6f67c3529ec69b22d61100042d3b335
MD5 8f33db752b2adbccd190e3ef9e2fb1c9
BLAKE2b-256 6ad618df8adc10a84539d0f5f4cf0cd64ffc8b4a5dc526adc61e8d4667796f08

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page