Skip to main content

A library for exploring and validating machine learning data.

Project description

TensorFlow Data Validation

Python PyPI Documentation

TensorFlow Data Validation (TFDV) is a library for exploring and validating machine learning data. It is designed to be highly scalable and to work well with TensorFlow and TensorFlow Extended (TFX).

TF Data Validation includes:

  • Scalable calculation of summary statistics of training and test data.
  • Integration with a viewer for data distributions and statistics, as well as faceted comparison of pairs of features (Facets)
  • Automated data-schema generation to describe expectations about data like required values, ranges, and vocabularies
  • A schema viewer to help you inspect the schema.
  • Anomaly detection to identify anomalies, such as missing features, out-of-range values, or wrong feature types, to name a few.
  • An anomalies viewer so that you can see what features have anomalies and learn more in order to correct them.

For instructions on using TFDV, see the get started guide and try out the example notebook.

Caution: TFDV may be backwards incompatible before version 1.0.

Installing from PyPI

The recommended way to install TFDV is using the PyPI package:

pip install tensorflow-data-validation

Installing from source

1. Prerequisites

To compile and use TFDV, you need to set up some prerequisites.

Install NumPy

If NumPy is not installed on your system, install it now by following these directions.

Install Bazel

If Bazel is not installed on your system, install it now by following these directions.

2. Clone the TFDV repository

git clone https://github.com/tensorflow/data-validation
cd data-validation

Note that these instructions will install the latest master branch of TensorFlow Data Validation. If you want to install a specific branch (such as a release branch), pass -b <branchname> to the git clone command.

3. Build the pip package

TFDV uses Bazel to build the pip package from source:

bazel run -c opt tensorflow_data_validation:build_pip_package

You can find the generated .whl file in the dist subdirectory.

4. Install the pip package

pip install dist/*.whl

Supported platforms

TFDV is built and tested on the following 64-bit operating systems:

  • macOS 10.12.6 (Sierra) or later.
  • Ubuntu 14.04 or later.

Dependencies

TFDV requires TensorFlow but does not depend on the tensorflow PyPI package. See the TensorFlow install guides for instructions on how to get started with TensorFlow.

Apache Beam is required; it's the way that efficient distributed computation is supported. By default, Apache Beam runs in local mode but can also run in distributed mode using Google Cloud Dataflow. TFDV is designed to be extensible for other Apache Beam runners.

Compatible versions

The following table shows the package versions that are compatible with each other. This is determined by our testing framework, but other untested combinations may also work.

tensorflow-data-validation tensorflow apache-beam[gcp]
GitHub master nightly (1.x) 2.11.0
0.13.1 1.13 2.11.0
0.13.0 1.13 2.11.0
0.12.0 1.12 2.10.0
0.11.0 1.11 2.8.0
0.9.0 1.9 2.6.0

Questions

Please direct any questions about working with TF Data Validation to Stack Overflow using the tensorflow-data-validation tag.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

tensorflow_data_validation-0.13.1-cp37-cp37m-macosx_10_9_x86_64.whl (2.6 MB view details)

Uploaded CPython 3.7m macOS 10.9+ x86-64

tensorflow_data_validation-0.13.1-cp36-cp36m-macosx_10_9_x86_64.whl (2.6 MB view details)

Uploaded CPython 3.6m macOS 10.9+ x86-64

tensorflow_data_validation-0.13.1-cp35-cp35m-macosx_10_6_intel.whl (2.6 MB view details)

Uploaded CPython 3.5m macOS 10.6+ intel

tensorflow_data_validation-0.13.1-cp27-cp27m-macosx_10_9_x86_64.whl (2.6 MB view details)

Uploaded CPython 2.7m macOS 10.9+ x86-64

File details

Details for the file tensorflow_data_validation-0.13.1-cp37-cp37m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_data_validation-0.13.1-cp37-cp37m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 1be48cb15f7f40277e447a6c23170f20ddcbb8988f5e8d63b586bb4a48b5b3f3
MD5 637f8fdac5d26ee6a175cc2e2336f87d
BLAKE2b-256 25d1ba3fd3b1a1328bef0c10e4afd3792dba89f678c9aea63503318d06130062

See more details on using hashes here.

File details

Details for the file tensorflow_data_validation-0.13.1-cp37-cp37m-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_data_validation-0.13.1-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 7926a645c43b8e2fffb5a449ed3fce0e2639017d5dd65bb31a4ad46362e6322d
MD5 fd2c0a0a6acee4dcb85b2b8062f9c680
BLAKE2b-256 7ad26f7d979b8f5c5304bf9a798cef140a4ce893beaf986b504f7874c1215eb1

See more details on using hashes here.

File details

Details for the file tensorflow_data_validation-0.13.1-cp36-cp36m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_data_validation-0.13.1-cp36-cp36m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 b2dbf6e6af10df3fd2f49b134c86016e0f29c9dadc3a9469208918670fde00e7
MD5 7fff88c0ad754c4aea3e75016a6259db
BLAKE2b-256 567ff735edd93025ae04ca159db92013d373968144d719fe203e37d043db8c87

See more details on using hashes here.

File details

Details for the file tensorflow_data_validation-0.13.1-cp36-cp36m-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_data_validation-0.13.1-cp36-cp36m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 ca1d5f97ed3101ecfd3d25a7db08b81bfd699ce3d09155cf6f881d82fb45fea0
MD5 14c8c41fecc1f4339447d38f0962003f
BLAKE2b-256 e80500521eced8c7c05c71af80f3fd0bc23ec16bbc5fc6fc618adf6ca0c2bd7d

See more details on using hashes here.

File details

Details for the file tensorflow_data_validation-0.13.1-cp35-cp35m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_data_validation-0.13.1-cp35-cp35m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 e11cb9a8c086c59b78dd37237ca4666cd089212ef3a41be40b60f00b82cd7c12
MD5 aa1b059731d0360d6704fa1475acdf5f
BLAKE2b-256 56dc981e0526efe7b1e2c4b9af1c8d7b1cc74ab8c92f11942b4eb5d786fdd497

See more details on using hashes here.

File details

Details for the file tensorflow_data_validation-0.13.1-cp35-cp35m-macosx_10_6_intel.whl.

File metadata

File hashes

Hashes for tensorflow_data_validation-0.13.1-cp35-cp35m-macosx_10_6_intel.whl
Algorithm Hash digest
SHA256 0c2027884eaee85ba3c9c0491ddc0a8960b97de03affdd650b189fdef7ca8e0a
MD5 5292bfdcf59fc524bc3c1216eae08823
BLAKE2b-256 872002af5deaceb7441a96814998e0e4ee82894a48ac607dc4a01fca966bd7b6

See more details on using hashes here.

File details

Details for the file tensorflow_data_validation-0.13.1-cp27-cp27mu-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_data_validation-0.13.1-cp27-cp27mu-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 e8996cd1ae37f1d8f8c211b0e277384cc4d75f1588dd573644bd2ba1a4a345bd
MD5 55ca0bcd0aa756acb6a6c6672bd2f450
BLAKE2b-256 3b8f225578af81851b8850c6bb4e0a2e695c2d0ad183580cf098fc4d2d317a2a

See more details on using hashes here.

File details

Details for the file tensorflow_data_validation-0.13.1-cp27-cp27m-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_data_validation-0.13.1-cp27-cp27m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 22254389bf191436840b56a1397d1d15d9ca86eb877da3a25d9e86f668158cbc
MD5 2e1a97d54508766f0531c824362269d2
BLAKE2b-256 5fc69956bac9cbd376e35dabf1c330b148dccb9922e9997c3137320d99005d87

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page