Skip to main content

A library for exploring and validating machine learning data.

Project description

TensorFlow Data Validation

Python PyPI Documentation

TensorFlow Data Validation (TFDV) is a library for exploring and validating machine learning data. It is designed to be highly scalable and to work well with TensorFlow and TensorFlow Extended (TFX).

TF Data Validation includes:

  • Scalable calculation of summary statistics of training and test data.
  • Integration with a viewer for data distributions and statistics, as well as faceted comparison of pairs of features (Facets)
  • Automated data-schema generation to describe expectations about data like required values, ranges, and vocabularies
  • A schema viewer to help you inspect the schema.
  • Anomaly detection to identify anomalies, such as missing features, out-of-range values, or wrong feature types, to name a few.
  • An anomalies viewer so that you can see what features have anomalies and learn more in order to correct them.

For instructions on using TFDV, see the get started guide and try out the example notebook.

Caution: TFDV may be backwards incompatible before version 1.0.

Installing from PyPI

The recommended way to install TFDV is using the PyPI package:

pip install tensorflow-data-validation

Installing from source

1. Prerequisites

To compile and use TFDV, you need to set up some prerequisites.

Install NumPy

If NumPy is not installed on your system, install it now by following these directions.

Install Bazel

If Bazel is not installed on your system, install it now by following these directions.

2. Clone the TFDV repository

git clone https://github.com/tensorflow/data-validation
cd data-validation

Note that these instructions will install the latest master branch of TensorFlow Data Validation. If you want to install a specific branch (such as a release branch), pass -b <branchname> to the git clone command.

3. Build the pip package

TFDV uses Bazel to build the pip package from source:

bazel run -c opt tensorflow_data_validation:build_pip_package

You can find the generated .whl file in the dist subdirectory.

4. Install the pip package

pip install dist/*.whl

Supported platforms

Note: TFDV currently requires Python 2.7. Support for Python 3 is coming very soon (tracked here).

TFDV is built and tested on the following 64-bit operating systems:

  • macOS 10.12.6 (Sierra) or later.
  • Ubuntu 14.04 or later.

Dependencies

TFDV requires TensorFlow but does not depend on the tensorflow PyPI package. See the TensorFlow install guides for instructions on how to get started with TensorFlow.

Apache Beam is required; it's the way that efficient distributed computation is supported. By default, Apache Beam runs in local mode but can also run in distributed mode using Google Cloud Dataflow. TFDV is designed to be extensible for other Apache Beam runners.

Compatible versions

The following table shows the package versions that are compatible with each other. This is determined by our testing framework, but other untested combinations may also work.

tensorflow-data-validation tensorflow apache-beam[gcp]
GitHub master nightly (1.x) 2.10.0
0.12.0 1.12 2.10.0
0.11.0 1.11 2.8.0
0.9.0 1.9 2.6.0

Questions

Please direct any questions about working with TF Data Validation to Stack Overflow using the tensorflow-data-validation tag.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

tensorflow_data_validation-0.12.0-cp27-cp27m-macosx_10_12_x86_64.whl (2.5 MB view details)

Uploaded CPython 2.7m macOS 10.12+ x86-64

tensorflow_data_validation-0.12.0-cp27-cp27m-macosx_10_11_x86_64.whl (2.5 MB view details)

Uploaded CPython 2.7m macOS 10.11+ x86-64

File details

Details for the file tensorflow_data_validation-0.12.0-cp27-cp27mu-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_data_validation-0.12.0-cp27-cp27mu-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 e9838f29cac08f190090cf7a411b89fb708dbf1bb93bdc6e08b998447f18689d
MD5 d0b8e321aa234da74d102fb4d9118743
BLAKE2b-256 9731242c157fc63fd0d2ae1e74266bd42718ae47a1136eda80565d153aad41ab

See more details on using hashes here.

File details

Details for the file tensorflow_data_validation-0.12.0-cp27-cp27m-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_data_validation-0.12.0-cp27-cp27m-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 6c7b05b74ef29db3c4d04dd101a58d6046a0bb44667d0fb301592fea991c2bff
MD5 d767579b5d8d543dee5103c12132f319
BLAKE2b-256 bcd5a191c58b6dfcf606cd5fcd7b250827adaf564d51353df9af2b6f1613c965

See more details on using hashes here.

File details

Details for the file tensorflow_data_validation-0.12.0-cp27-cp27m-macosx_10_11_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_data_validation-0.12.0-cp27-cp27m-macosx_10_11_x86_64.whl
Algorithm Hash digest
SHA256 56e707ee534a1c1ddb0cc0f36551216ea8a33bc6c0bdf9ef0306714a3129a8b4
MD5 e5dbf10d84e474523f2bda6222377d9b
BLAKE2b-256 9ea61359d22bf24866630c3f94093a24d31a7d5fe467d9c36134f3bb5ed6e001

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page