library of functions for managing and improving data quality in Datasets
Project description
Data-Quality-Kit
Functional Description
A library of functions for managing and improving data quality in Datasets
Owner
For any bugs or questions, please reach out to Dante Pedrozo
Branching Methodology
This project follows a Git Flow simplified branching methodology
- Master Branch: production code
- Develop Branch: main integration branch for ongoing development. Features and fixes are merged into this branch before reaching master
- Feature Branch: created from develop branch to work on new features
Prerequisites
This project uses:
- Language: Python 3.10
- Libraries:
- pandas
- pytest
- assertpy
How to use it
Install the library
pip install data-quality-kit
from data_quality_quick.validate_formats import check_type_format
Functionalities
- Completeness
- assert_that_dataframe_is_empty: Check if a DataFrame is empty.
- Validity
- assert_that_there_are_not_nulls: Checks for null values in a specified column of a DataFrame.
- Consistency
- assert_that_there_are_not_duplicates: Checks for duplicate values in the specified primary key column of a DataFrame.
- assert_that_columns_values_match : Check if all values in column2 of df2 are present in column1 of df1. Accuracy
- assert_that_type_value: Check if all non-null entries in a specified column of a DataFrame are of the specified data type.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
data_quality_kit-0.6.0.tar.gz
(8.5 kB
view hashes)
Built Distribution
Close
Hashes for data_quality_kit-0.6.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 40bd4f9e4c614928f2fa699eab4e803d5c2c910c98de7a98a906b2a57e3bb976 |
|
MD5 | 98b620d65abdbce1fbe41ba9393e0a9b |
|
BLAKE2b-256 | 061f10dd9a0b90980f2102b42134c609ec1e4277541af8358598ac03d8b1262f |