Skip to main content

Visualize data quality

Project description

Python Package

vizdataquality

This is a Python package for visualizing data quality, and has two main parts. One is software that helps you comprehensively profile and investigate data quality using this workflow from the 6-step Data Quality Method:

  1. Is anything obviously wrong (look at your data and any documentation)?
  2. Watch out for special values
  3. Is any data missing?
  4. Check each variable
  5. Check combinations of variables
  6. Profile the cleaned data

The other is software for investigating patterns and structures of missing values in your data. When a given pattern of missing values has been found to be associated with other factors or attributes of the data then it becomes a "structure of missingness". Patterns and structures of missing values are part of Step 5 of the workflow, because they involve combinations of variables.

Documentation

The vizdataquality documentation is hosted on Read the Docs.

Installation

We recommend installing vizdataquality in a python virtual environment or Conda environment.

To install vizdataquality, most users should run:

pip install 'vizdataquality'

Tutorials

The package includes notebooks that show you how to:

After installing vizdataquality, to follow theses tutorials interactively you will need to clone or download this repository. Then start jupyter from within it:

python -m jupyter notebook notebooks

Development

  • Documentation is built on readthedocs.com from main branch
  • PyPi pulls on creating a release on project repository on GitHub.

Notice

The vizdataquality software is released under the Apache Licence, version 2.0. See LICENCE for details.

The file missing_data_functions.py contains some code that has been derived from setvis, which uses the same licence as vizdataquality. The same person leads the development of both packages.

Acknowledgements

The development of the vizdataquality software was supported by funding from the Engineering and Physical Sciences Research Council (EP/N013980/1; EP/R511717/1; EP/X029689/1), the Alan Turing Institute and Higher Education Innovation Funding (HEIF) from UK Research and Innovation (UKRI).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vizdataquality-1.1.5.tar.gz (62.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vizdataquality-1.1.5-py3-none-any.whl (68.0 kB view details)

Uploaded Python 3

File details

Details for the file vizdataquality-1.1.5.tar.gz.

File metadata

  • Download URL: vizdataquality-1.1.5.tar.gz
  • Upload date:
  • Size: 62.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for vizdataquality-1.1.5.tar.gz
Algorithm Hash digest
SHA256 4dda458c2d516f41414ab17a260aa3206addfbce7a4810b48c410e8a470e7f66
MD5 8e8ffd47a3410271b5e84541df2192da
BLAKE2b-256 581d34d8c660f05dd0b5659e6f6017f3630ff85e22fb5d62af40224548634e1a

See more details on using hashes here.

File details

Details for the file vizdataquality-1.1.5-py3-none-any.whl.

File metadata

  • Download URL: vizdataquality-1.1.5-py3-none-any.whl
  • Upload date:
  • Size: 68.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for vizdataquality-1.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 5fc27de734ff59c20915747cc16bd7a4b462ae6c423c36aef555717668d1da18
MD5 687ed6b042d8a6545e02a80381c52a22
BLAKE2b-256 13d55bd6fb95e730f7ded7ea399533ec37d54b2003e21ef7f46d77ec8a1b26d6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page