Skip to main content

A library for exploring and validating machine learning data.

Project description

TensorFlow Data Validation

Python PyPI Documentation

TensorFlow Data Validation (TFDV) is a library for exploring and validating machine learning data. It is designed to be highly scalable and to work well with TensorFlow and TensorFlow Extended (TFX).

TF Data Validation includes:

  • Scalable calculation of summary statistics of training and test data.
  • Integration with a viewer for data distributions and statistics, as well as faceted comparison of pairs of features (Facets)
  • Automated data-schema generation to describe expectations about data like required values, ranges, and vocabularies
  • A schema viewer to help you inspect the schema.
  • Anomaly detection to identify anomalies, such as missing features, out-of-range values, or wrong feature types, to name a few.
  • An anomalies viewer so that you can see what features have anomalies and learn more in order to correct them.

For instructions on using TFDV, see the get started guide and try out the example notebook.

Caution: TFDV may be backwards incompatible before version 1.0.

Installing from PyPI

The recommended way to install TFDV is using the PyPI package:

pip install tensorflow-data-validation

Installing from source

1. Prerequisites

To compile and use TFDV, you need to set up some prerequisites.

Install NumPy

If NumPy is not installed on your system, install it now by following these directions.

Install Bazel

If Bazel is not installed on your system, install it now by following these directions.

2. Clone the TFDV repository

git clone https://github.com/tensorflow/data-validation
cd data-validation

Note that these instructions will install the latest master branch of TensorFlow Data Validation. If you want to install a specific branch (such as a release branch), pass -b <branchname> to the git clone command.

3. Build the pip package

TFDV uses Bazel to build the pip package from source:

bazel run -c opt tensorflow_data_validation:build_pip_package

You can find the generated .whl file in the dist subdirectory.

4. Install the pip package

pip install dist/*.whl

Supported platforms

TFDV is built and tested on the following 64-bit operating systems:

  • macOS 10.12.6 (Sierra) or later.
  • Ubuntu 14.04 or later.

Dependencies

TFDV requires TensorFlow but does not depend on the tensorflow PyPI package. See the TensorFlow install guides for instructions on how to get started with TensorFlow.

Apache Beam is required; it's the way that efficient distributed computation is supported. By default, Apache Beam runs in local mode but can also run in distributed mode using Google Cloud Dataflow. TFDV is designed to be extensible for other Apache Beam runners.

Compatible versions

The following table shows the package versions that are compatible with each other. This is determined by our testing framework, but other untested combinations may also work.

tensorflow-data-validation tensorflow apache-beam[gcp]
GitHub master nightly (1.x) 2.11.0
0.13.0 1.13 2.11.0
0.12.0 1.12 2.10.0
0.11.0 1.11 2.8.0
0.9.0 1.9 2.6.0

Questions

Please direct any questions about working with TF Data Validation to Stack Overflow using the tensorflow-data-validation tag.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

tensorflow_data_validation-0.13.0-cp37-cp37m-macosx_10_9_x86_64.whl (2.6 MB view details)

Uploaded CPython 3.7m macOS 10.9+ x86-64

tensorflow_data_validation-0.13.0-cp36-cp36m-macosx_10_9_x86_64.whl (2.6 MB view details)

Uploaded CPython 3.6m macOS 10.9+ x86-64

tensorflow_data_validation-0.13.0-cp35-cp35m-macosx_10_6_intel.whl (2.6 MB view details)

Uploaded CPython 3.5m macOS 10.6+ intel

tensorflow_data_validation-0.13.0-cp27-cp27m-macosx_10_9_x86_64.whl (2.6 MB view details)

Uploaded CPython 2.7m macOS 10.9+ x86-64

File details

Details for the file tensorflow_data_validation-0.13.0-cp37-cp37m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_data_validation-0.13.0-cp37-cp37m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 fbcdfaad4efb78e3f0c0d2b375600f5d91fe1e9dfb1bde44f71aa029fa831185
MD5 ffa1ad311480b2388272c75ae7c72074
BLAKE2b-256 30e667477cbf336f52fe210ee01cf6a34c3320b875e3b854e966081cad81dfab

See more details on using hashes here.

File details

Details for the file tensorflow_data_validation-0.13.0-cp37-cp37m-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_data_validation-0.13.0-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 3504c6303a25d82658c00509b2581a17b553db63d74fea2697c988e163db0ce1
MD5 532da16ddfc96fa6f01ed294d919c27b
BLAKE2b-256 e435930b46fdba9817e0de28eee41c62b96b5f27b6f1024a7d1feb6d2d4720fb

See more details on using hashes here.

File details

Details for the file tensorflow_data_validation-0.13.0-cp36-cp36m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_data_validation-0.13.0-cp36-cp36m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 fc1e4e15b1160ef5f0694a325a4472a781e53a66044855dadd86d0f915a91550
MD5 de0091cd2d0f407b12f54b687eb8b21f
BLAKE2b-256 a3a92f729b8500f6a7ee2437f68b376c156f18fca7fdb5e4036604a5d706cc86

See more details on using hashes here.

File details

Details for the file tensorflow_data_validation-0.13.0-cp36-cp36m-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_data_validation-0.13.0-cp36-cp36m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 d4f9167b10518efd8ba2a3d059f79b656b2a91631076f72e9ee01638d22e561f
MD5 b8c4a10683ae73ce6110f13dc4f9d32b
BLAKE2b-256 83af5c513f8047c4a721569a045787dfd700eae89492eb06c5f72be9f493153f

See more details on using hashes here.

File details

Details for the file tensorflow_data_validation-0.13.0-cp35-cp35m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_data_validation-0.13.0-cp35-cp35m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 05fe285c289d2f6a6c41471f32a8ed26b55605f6a46d22279c6486c2e3bf8f1f
MD5 b90e97050749d5f12721509e1fe356ca
BLAKE2b-256 aba8ffb1b9d588e83db7fe6b31a7e8f13e3ec9c2b684214e3608102822241f32

See more details on using hashes here.

File details

Details for the file tensorflow_data_validation-0.13.0-cp35-cp35m-macosx_10_6_intel.whl.

File metadata

File hashes

Hashes for tensorflow_data_validation-0.13.0-cp35-cp35m-macosx_10_6_intel.whl
Algorithm Hash digest
SHA256 3076b07fc54ecb6464a0c0619c8a9b8e97073c61615ea04f2a7fdcfdb1ad6132
MD5 02a001fd8986ce6de52d227c54261ad9
BLAKE2b-256 9202ac416f8861cba47bfa30ae9fc588c6b0a6bd0e5831fa93f658098d774680

See more details on using hashes here.

File details

Details for the file tensorflow_data_validation-0.13.0-cp27-cp27mu-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_data_validation-0.13.0-cp27-cp27mu-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 d6eda0acffeb3e650e6f7aa35c0820e5a4825075b31984a8bb3f7379bf7d0246
MD5 d5360918c69288015340cbfe8a60e3b8
BLAKE2b-256 3bcebcd246ea12416e14ad249fe17bff7dd93ce06e47d3ae2d744583ce505131

See more details on using hashes here.

File details

Details for the file tensorflow_data_validation-0.13.0-cp27-cp27m-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_data_validation-0.13.0-cp27-cp27m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 fa2022f1c6748140681fe017ae8ace334bcc1ba89f8f87b96283ea79db5a41d8
MD5 8889eda0ddbc112ff2d9ef0f2db3e725
BLAKE2b-256 0dc00c20eb50df330e686b4a7a7d9dace2caf9d08733162e59cce709ea808458

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page