Skip to main content

TensorFlow IO

Project description




TensorFlow I/O

GitHub CI PyPI License Documentation

TensorFlow I/O is a collection of file systems and file formats that are not available in TensorFlow's built-in support. A full list of supported file systems and file formats by TensorFlow I/O can be found here.

The use of tensorflow-io is straightforward with keras. Below is an example to Get Started with TensorFlow with the data processing aspect replaced by tensorflow-io:

import tensorflow as tf
import tensorflow_io as tfio

# Read the MNIST data into the IODataset.
dataset_url = "https://storage.googleapis.com/cvdf-datasets/mnist/"
d_train = tfio.IODataset.from_mnist(
    dataset_url + "train-images-idx3-ubyte.gz",
    dataset_url + "train-labels-idx1-ubyte.gz",
)

# Shuffle the elements of the dataset.
d_train = d_train.shuffle(buffer_size=1024)

# By default image data is uint8, so convert to float32 using map().
d_train = d_train.map(lambda x, y: (tf.image.convert_image_dtype(x, tf.float32), y))

# prepare batches the data just like any other tf.data.Dataset
d_train = d_train.batch(32)

# Build the model.
model = tf.keras.models.Sequential(
    [
        tf.keras.layers.Flatten(input_shape=(28, 28)),
        tf.keras.layers.Dense(512, activation=tf.nn.relu),
        tf.keras.layers.Dropout(0.2),
        tf.keras.layers.Dense(10, activation=tf.nn.softmax),
    ]
)

# Compile the model.
model.compile(
    optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"]
)

# Fit the model.
model.fit(d_train, epochs=5, steps_per_epoch=200)

In the above MNIST example, the URL's to access the dataset files are passed directly to the tfio.IODataset.from_mnist API call. This is due to the inherent support that tensorflow-io provides for HTTP/HTTPS file system, thus eliminating the need for downloading and saving datasets on a local directory.

NOTE: Since tensorflow-io is able to detect and uncompress the MNIST dataset automatically if needed, we can pass the URL's for the compressed files (gzip) to the API call as is.

Please check the official documentation for more detailed and interesting usages of the package.

Installation

Python Package

The tensorflow-io Python package can be installed with pip directly using:

$ pip install tensorflow-io

People who are a little more adventurous can also try our nightly binaries:

$ pip install tensorflow-io-nightly

To ensure you have a version of TensorFlow that is compatible with TensorFlow-IO, you can specify the tensorflow extra requirement during install:

pip install tensorflow-io[tensorflow]

Similar extras exist for the tensorflow-gpu, tensorflow-cpu and tensorflow-rocm packages.

Docker Images

In addition to the pip packages, the docker images can be used to quickly get started.

For stable builds:

$ docker pull tfsigio/tfio:latest
$ docker run -it --rm --name tfio-latest tfsigio/tfio:latest

For nightly builds:

$ docker pull tfsigio/tfio:nightly
$ docker run -it --rm --name tfio-nightly tfsigio/tfio:nightly

R Package

Once the tensorflow-io Python package has been successfully installed, you can install the development version of the R package from GitHub via the following:

if (!require("remotes")) install.packages("remotes")
remotes::install_github("tensorflow/io", subdir = "R-package")

TensorFlow Version Compatibility

To ensure compatibility with TensorFlow, it is recommended to install a matching version of TensorFlow I/O according to the table below. You can find the list of releases here.

TensorFlow I/O Version TensorFlow Compatibility Release Date
0.27.0 2.10.x Sep 08, 2022
0.26.0 2.9.x May 17, 2022
0.25.0 2.8.x Apr 19, 2022
0.24.0 2.8.x Feb 04, 2022
0.23.1 2.7.x Dec 15, 2021
0.23.0 2.7.x Dec 14, 2021
0.22.0 2.7.x Nov 10, 2021
0.21.0 2.6.x Sep 12, 2021
0.20.0 2.6.x Aug 11, 2021
0.19.1 2.5.x Jul 25, 2021
0.19.0 2.5.x Jun 25, 2021
0.18.0 2.5.x May 13, 2021
0.17.1 2.4.x Apr 16, 2021
0.17.0 2.4.x Dec 14, 2020
0.16.0 2.3.x Oct 23, 2020
0.15.0 2.3.x Aug 03, 2020
0.14.0 2.2.x Jul 08, 2020
0.13.0 2.2.x May 10, 2020
0.12.0 2.1.x Feb 28, 2020
0.11.0 2.1.x Jan 10, 2020
0.10.0 2.0.x Dec 05, 2019
0.9.1 2.0.x Nov 15, 2019
0.9.0 2.0.x Oct 18, 2019
0.8.1 1.15.x Nov 15, 2019
0.8.0 1.15.x Oct 17, 2019
0.7.2 1.14.x Nov 15, 2019
0.7.1 1.14.x Oct 18, 2019
0.7.0 1.14.x Jul 14, 2019
0.6.0 1.13.x May 29, 2019
0.5.0 1.13.x Apr 12, 2019
0.4.0 1.13.x Mar 01, 2019
0.3.0 1.12.0 Feb 15, 2019
0.2.0 1.12.0 Jan 29, 2019
0.1.0 1.12.0 Dec 16, 2018

Performance Benchmarking

We use github-pages to document the results of API performance benchmarks. The benchmark job is triggered on every commit to master branch and facilitates tracking performance w.r.t commits.

Contributing

Tensorflow I/O is a community led open source project. As such, the project depends on public contributions, bug-fixes, and documentation. Please see:

Build Status and CI

Build Status
Linux CPU Python 2 Status
Linux CPU Python 3 Status
Linux GPU Python 2 Status
Linux GPU Python 3 Status

Because of manylinux2010 requirement, TensorFlow I/O is built with Ubuntu:16.04 + Developer Toolset 7 (GCC 7.3) on Linux. Configuration with Ubuntu 16.04 with Developer Toolset 7 is not exactly straightforward. If the system have docker installed, then the following command will automatically build manylinux2010 compatible whl package:

#!/usr/bin/env bash

ls dist/*
for f in dist/*.whl; do
  docker run -i --rm -v $PWD:/v -w /v --net=host quay.io/pypa/manylinux2010_x86_64 bash -x -e /v/tools/build/auditwheel repair --plat manylinux2010_x86_64 $f
done
sudo chown -R $(id -nu):$(id -ng) .
ls wheelhouse/*

It takes some time to build, but once complete, there will be python 3.5, 3.6, 3.7 compatible whl packages available in wheelhouse directory.

On macOS, the same command could be used. However, the script expects python in shell and will only generate a whl package that matches the version of python in shell. If you want to build a whl package for a specific python then you have to alias this version of python to python in shell. See .github/workflows/build.yml Auditwheel step for instructions how to do that.

Note the above command is also the command we use when releasing packages for Linux and macOS.

TensorFlow I/O uses both GitHub Workflows and Google CI (Kokoro) for continuous integration. GitHub Workflows is used for macOS build and test. Kokoro is used for Linux build and test. Again, because of the manylinux2010 requirement, on Linux whl packages are always built with Ubuntu 16.04 + Developer Toolset 7. Tests are done on a variatiy of systems with different python3 versions to ensure a good coverage:

Python Ubuntu 18.04 Ubuntu 20.04 macOS + osx9 Windows-2019
2.7 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: N/A
3.7 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:
3.8 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:

TensorFlow I/O has integrations with many systems and cloud vendors such as Prometheus, Apache Kafka, Apache Ignite, Google Cloud PubSub, AWS Kinesis, Microsoft Azure Storage, Alibaba Cloud OSS etc.

We tried our best to test against those systems in our continuous integration whenever possible. Some tests such as Prometheus, Kafka, and Ignite are done with live systems, meaning we install Prometheus/Kafka/Ignite on CI machine before the test is run. Some tests such as Kinesis, PubSub, and Azure Storage are done through official or non-official emulators. Offline tests are also performed whenever possible, though systems covered through offine tests may not have the same level of coverage as live systems or emulators.

Live System Emulator CI Integration Offline
Apache Kafka :heavy_check_mark: :heavy_check_mark:
Apache Ignite :heavy_check_mark: :heavy_check_mark:
Prometheus :heavy_check_mark: :heavy_check_mark:
Google PubSub :heavy_check_mark: :heavy_check_mark:
Azure Storage :heavy_check_mark: :heavy_check_mark:
AWS Kinesis :heavy_check_mark: :heavy_check_mark:
Alibaba Cloud OSS :heavy_check_mark:
Google BigTable/BigQuery to be added
Elasticsearch (experimental) :heavy_check_mark: :heavy_check_mark:
MongoDB (experimental) :heavy_check_mark: :heavy_check_mark:

References for emulators:

Community

Additional Information

License

Apache License 2.0

Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.27.0.dev20220907144918-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.27.0.dev20220907144918-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 54563505c1de7e79d7d9e209a6255f7a60a5bafca76f55a5a8b0b8f378f2d33a
MD5 49d53ab96e73e515952c2029d1ba74db
BLAKE2b-256 c31541053d3902f0390f9890b80834dd7c4b45a94cf876ff0d97ed01cb55a7a7

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.27.0.dev20220907144918-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.27.0.dev20220907144918-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 0fa5c8f9b5d2ef6d1e01b8dadace040d414e622aa4edc2b54f6c5a260f88c8a5
MD5 5bd616c4c3ea644f5b24f7a2337b600c
BLAKE2b-256 319356c1cf6dd7b559133b6c15ee6ea98ac85a9638938acd4e6ce680a410df30

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.27.0.dev20220907144918-cp310-cp310-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.27.0.dev20220907144918-cp310-cp310-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 ff5d332a67143cd7a24974466b3184c7d507512304d93b5964db672d72fcc099
MD5 4173b9e066564f2fca64b11e16ec587d
BLAKE2b-256 c63b7e9c2aea357ccb2138f4d7748b9ccf13db7308fe241801bee011c84b2c00

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.27.0.dev20220907144918-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.27.0.dev20220907144918-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 3b3b6205e7facb0f61f2350eb0ddaf58f4001c3a0d97c30694f3a093e93ea6d4
MD5 2f71908789682f89ffef29a93b870571
BLAKE2b-256 f1537dcde8659fd81f203ed60cbd0fee68081233a9b8940d19408046336ce3cd

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.27.0.dev20220907144918-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.27.0.dev20220907144918-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 e2493d2101e0a25b4f40246bcaa9b4045b9ac9f2e1ae0cbe3c77b63557bd3227
MD5 b85299ebeba0595d25b8f3fddf444d15
BLAKE2b-256 c935b2f791924b5d52671d1be585668723b3328ac9b31f62492c61b0c26c2427

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.27.0.dev20220907144918-cp39-cp39-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.27.0.dev20220907144918-cp39-cp39-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 ebd9ff8a9b5db471900ba0c643148a2dc4a32e3b48af6fd0fcc8e0855873e6c6
MD5 a406c3d193cf931e6f178b8a5226e2b6
BLAKE2b-256 32523d45277da78b0802707891046c257fd5a5de319ba26df3b00d5254c1949f

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.27.0.dev20220907144918-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.27.0.dev20220907144918-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 1701b106dbc40fc182b59bc383f6874850ad6011a92b0a716304b9e52d48e204
MD5 45d32185581c96cbf28078a3233f5f63
BLAKE2b-256 9836ae110f00f82367db0449a9d311274711d9a910ab37af28e6fc2ed585973b

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.27.0.dev20220907144918-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.27.0.dev20220907144918-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 5a8426ce606d6aab21c1d70de71de1824205be0aa361dc91662f593e9634b107
MD5 47a153b866d4f000f6185271f863afe4
BLAKE2b-256 0c7869477ea6a66af42fa7a03ebe4053cb88125d9d419bca16450c0aac718a32

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.27.0.dev20220907144918-cp38-cp38-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.27.0.dev20220907144918-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 5f6bcef0bdc9d5d464f6717c9df9ec39c0085a8a351c8ace0ae342e24d66b185
MD5 18d33790a2c396eb03a05400ba8e2c0e
BLAKE2b-256 52a00997a5a3a5d813bf6a21a1907ca6bc1af30934137c2d608dbade26b97f8e

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.27.0.dev20220907144918-cp37-cp37m-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.27.0.dev20220907144918-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 96ae81e533e07e821fd81da00dc7f41868fb38b5684e773d31154eaba931d31d
MD5 9f0731275e50e8f9195cb4392212e308
BLAKE2b-256 28b24aff9426cbcf13f04b28cefbb7ae94f92d31b44098887000589d89c8a0dd

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.27.0.dev20220907144918-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.27.0.dev20220907144918-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 399d806104936400d1206415d0fd57b44abaf58b3aba38324609fd52cd9fb2ce
MD5 f70e85bc53fc467ec2df1d7cfe40f91b
BLAKE2b-256 e7829f37b1f6024904f95c950e4304e11dfd787b3a31ad5dfe5584060c7b2faa

See more details on using hashes here.

File details

Details for the file tensorflow_io_gcs_filesystem_nightly-0.27.0.dev20220907144918-cp37-cp37m-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_gcs_filesystem_nightly-0.27.0.dev20220907144918-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 9062a7a09cb2d7ca184931cc06236c7de3a0585d9539b2a12b8691400e0c8a9d
MD5 f769d6e3cd0fdeb83a1b9dde977c7efc
BLAKE2b-256 5c45700eb1e24c1606627e1b4ebf455c7e0af78efdfc553c3ae682b8c80ea95f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page