Skip to main content

Visualize data quality

Project description

Python Package

vizdataquality

This is a Python package for visualizing data quality, and has two main parts. One is software that helps you comprehensively profile and investigate data quality using this six-step workflow:

  1. Look at your data (is anything obviously wrong?)
  2. Watch out for special values
  3. Is any data missing?
  4. Check each variable
  5. Check combinations of variables
  6. Profile the cleaned data

The other is software for investigating patterns and structures of missing values in your data. When a given pattern of missing values has been found to be associated with other factors or attributes of the data then it becomes a "structure of missingness". Patterns and structures of missing values are part of Step 5 of the workflow, because they involve combinations of variables.

Documentation

The vizdataquality documentation is hosted on Read the Docs.

Installation

We recommend installing vizdataquality in a python virtual environment or Conda environment.

To install vizdataquality, most users should run:

pip install 'vizdataquality'

Tutorials

The package includes notebooks that show you how to:

After installing vizdataquality, to follow theses tutorials interactively you will need to clone or download this repository. Then start jupyter from within it:

python -m jupyter notebook notebooks

Development

  • Documentation is built on readthedocs.com from main branch
  • PyPi pulls on creating a release on project repository on GitHub.

Notice

The vizdataquality software is released under the Apache Licence, version 2.0. See LICENCE for details.

The file missing_data_functions.py contains some code that has been derived from setvis, which uses the same licence as vizdataquality. The same person leads the development of both packages.

Acknowledgements

The development of the vizdataquality software was supported by funding from the Engineering and Physical Sciences Research Council (EP/N013980/1; EP/R511717/1) and the Alan Turing Institute.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vizdataquality-1.1.0.tar.gz (54.4 kB view details)

Uploaded Source

Built Distribution

vizdataquality-1.1.0-py3-none-any.whl (59.8 kB view details)

Uploaded Python 3

File details

Details for the file vizdataquality-1.1.0.tar.gz.

File metadata

  • Download URL: vizdataquality-1.1.0.tar.gz
  • Upload date:
  • Size: 54.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.19

File hashes

Hashes for vizdataquality-1.1.0.tar.gz
Algorithm Hash digest
SHA256 72548bd54020908707c64b94b11c2a9abdfe3fe1c3658f0183c6d2c2728a12d9
MD5 ce6cc003dac384b8e73b9a00e7cd3068
BLAKE2b-256 a59cd8ee16cbe59b399f32456be29242311ee1b88ca81afdad189658318b1d55

See more details on using hashes here.

File details

Details for the file vizdataquality-1.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for vizdataquality-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d0d1e21727f6e253850b6e0a65db4f2ff65c4abbc3f24f14ba6cd9d1cbc8fcfc
MD5 f6041970446de3ef8e8a5cdf48d73d6c
BLAKE2b-256 57d165f093e4bc1dcc274b513bd0a237c8ded7e03ad5b0e2dd19df08b23d0c6f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page