Skip to main content

library of functions for managing and improving data quality in Datasets

Project description

Data-Quality-Kit

Functional Description

A library of functions for managing and improving data quality in Datasets

Owner

For any bugs or questions, please reach out to Dante Pedrozo

Branching Methodology

This project follows a Git Flow simplified branching methodology

  • Master Branch: production code
  • Develop Branch: main integration branch for ongoing development. Features and fixes are merged into this branch before reaching master
  • Feature Branch: created from develop branch to work on new features

Prerequisites

This project uses:

  • Language: Python 3.10
  • Libraries:
    • pandas
    • pytest
    • assertpy

How to use it

Install the library

pip install data-quality-kit
from data_quality_quick.validate_formats import check_type_format

Functionalities

  • Completeness
    • assert_that_dataframe_is_empty: Check if a DataFrame is empty.
  • Validity
    • assert_that_there_are_not_nulls: Checks for null values in a specified column of a DataFrame.
  • Consistency
    • assert_that_there_are_not_duplicates: Checks for duplicate values in the specified primary key column of a DataFrame.
    • assert_that_columns_values_match : Check if all values in column2 of df2 are present in column1 of df1.
  • Accuracy
    • assert_that_type_value: Check if all non-null entries in a specified column of a DataFrame are of the specified data type.
    • assert_that_values_in_catalog: Checks whether all values in the specified column of a DataFrame are present in a catalog (list of values).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

data_quality_kit-0.7.0.tar.gz (8.9 kB view details)

Uploaded Source

Built Distribution

data_quality_kit-0.7.0-py3-none-any.whl (16.4 kB view details)

Uploaded Python 3

File details

Details for the file data_quality_kit-0.7.0.tar.gz.

File metadata

  • Download URL: data_quality_kit-0.7.0.tar.gz
  • Upload date:
  • Size: 8.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for data_quality_kit-0.7.0.tar.gz
Algorithm Hash digest
SHA256 3a493c86a6cd3b8115db336011cc5fc049b6f189504e749fbe1108a9d213aa83
MD5 322558ff07421e4bc35ae8e1144aee0f
BLAKE2b-256 9a79e6c9d51d0d0a499b03fed654e509449d2917328b93f580db4286207c490e

See more details on using hashes here.

File details

Details for the file data_quality_kit-0.7.0-py3-none-any.whl.

File metadata

File hashes

Hashes for data_quality_kit-0.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 26d60f7448d9dbd382b4edf5fe9a04d03d586cc046f568fee68f7acd2cbb3b93
MD5 493e78cc29979e2f5f7bf19c85244bf6
BLAKE2b-256 7560790abb382afab93910d406664ed671fbb3587e27929e106fa6e850ba6d89

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page