library of functions for managing and improving data quality in Datasets
Project description
Data-Quality-Kit
Functional Description
A library of functions for managing and improving data quality in Datasets
Owner
For any bugs or questions, please reach out to Dante Pedrozo
Branching Methodology
This project follows a Git Flow simplified branching methodology
- Master Branch: production code
- Develop Branch: main integration branch for ongoing development. Features and fixes are merged into this branch before reaching master
- Feature Branch: created from develop branch to work on new features
Prerequisites
This project uses:
- Language: Python 3.10
- Libraries:
- pandas
- pytest
- assertpy
How to use it
Install the library
pip install data-quality-kit
from data_quality_quick.validate_formats import check_type_format
Functionalities
- Completeness
- assert_that_dataframe_is_empty: Check if a DataFrame is empty.
- Validity
- assert_that_there_are_not_nulls: Checks for null values in a specified column of a DataFrame.
- Consistency
- assert_that_there_are_not_duplicates: Checks for duplicate values in the specified primary key column of a DataFrame.
- assert_that_columns_values_match : Check if all values in column2 of df2 are present in column1 of df1.
- Accuracy
- assert_that_type_value: Check if all non-null entries in a specified column of a DataFrame are of the specified data type.
- assert_that_values_in_catalog: Checks whether all values in the specified column of a DataFrame are present in a catalog (list of values).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file data_quality_kit-0.7.0.tar.gz
.
File metadata
- Download URL: data_quality_kit-0.7.0.tar.gz
- Upload date:
- Size: 8.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3a493c86a6cd3b8115db336011cc5fc049b6f189504e749fbe1108a9d213aa83 |
|
MD5 | 322558ff07421e4bc35ae8e1144aee0f |
|
BLAKE2b-256 | 9a79e6c9d51d0d0a499b03fed654e509449d2917328b93f580db4286207c490e |
File details
Details for the file data_quality_kit-0.7.0-py3-none-any.whl
.
File metadata
- Download URL: data_quality_kit-0.7.0-py3-none-any.whl
- Upload date:
- Size: 16.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 26d60f7448d9dbd382b4edf5fe9a04d03d586cc046f568fee68f7acd2cbb3b93 |
|
MD5 | 493e78cc29979e2f5f7bf19c85244bf6 |
|
BLAKE2b-256 | 7560790abb382afab93910d406664ed671fbb3587e27929e106fa6e850ba6d89 |