eazyml-data-quality from EazyML family for comprehensive data quality assessment, including bias detection, outlier identification, and data drift analysis.
Project description
EazyML Responsible-AI: Data Quality Assessment
Overview
eazyml-data-quality is a python utility designed to evaluate the quality of datasets by performing various checks such as data shape, emptiness, outlier detection, balance, and correlation. It helps users identify potential issues in their datasets and provides detailed feedback to ensure data readiness for downstream processes.
It offers APIs for data quality assessment across multiple dimensions, including:
Features
- Missing Value Analysis: Detect and impute missing values.
- Bias Detection: Uncover and mitigate bias in datasets.
- Data Drift and Model Drift Analysis: Monitor changes in data distributions over time.
- Data Shape Quality: Validates dataset dimensions and checks if the number of rows is sufficient relative to the number of columns.
- Data Emptiness Check: Identifies and reports missing values in the dataset.
- Outlier Detection: Detects and removes outliers based on statistical analysis.
- Data Balance Check: Analyzes the balance of the dataset and computes a balance score.
- Correlation Analysis: Identify multicollinearity, relationships between features and provides alerts for highly correlated features.
- Summary Alerts: Consolidates key quality issues into a single summary for quick review.
With eazyml-data-quality, you can ensure that your training data is clean, balanced, and ready for machine learning.
Installation
To use the Data Quality Checker, ensure you have Python installed on your system.
User installation
The easiest way to install data quality is using pip:
pip install -U eazyml-data-quality
Dependencies
This package requires:
- pandas
- scikit-learn
- numpy
- openpyxl
- eazyml-insight
Usage
Here's an example of how you can use the APIs from this package.
Imports
from eazyml_data_quality import ez_data_quality
Initialize and Read Data
# Initialize the EazyML automl library.
_ = ez_init()
# Define training data (Replace with the correct data path).
train_data_path = "path_to_your_training_data.csv"
# Define test data (Replace with the correct data path).
test_data_path = "path_to_your_test_data.csv"
Perform Data Quality
# Define the outcome (target variable)
outcome = "target" # Replace with your target variable name
# Customize options to perform data quality
dqa_options = {
"data_shape": "yes",
"data_balance": "yes",
"data_emptiness": "yes",
"data_outliers": "yes",
"remove_outliers": "yes",
"outcome_correlation": "yes",
"data_drift": "yes",
"model_drift": "yes",
"prediction_data": test_data_path,
"data_completeness": "yes",
"data_correctness": "yes"
}
# Call the EazyML APIs to perform data quality
dqa_response = ez_data_quality(train_data_path, outcome, options=dqa_options)
# dqa_response is a dictionary object with following keys.
# print(dqa_response.keys())
# dict_keys(['success', 'message', 'data_shape_quality', 'data_emptiness_quality', 'data_outliers_quality', 'data_balance_quality', 'data_correlation_quality', 'data_completeness_quality', 'data_correctness_quality', 'drift_quality', 'data_bad_quality_alerts'])
# the response object contains a dictionary with the results of all data quality checks, along with the data quality alerts selected by the user.
You can find more information in the documentation.
Useful links, other packages from EazyML family
-
If you have questions or would like to discuss a use case, please contact us here
-
Here are the other packages from EazyML suite:
- eazyml-automl: eazyml-automl provides a suite of APIs for training, optimizing and validating machine learning models with built-in AutoML capabilities, hyperparameter tuning, and cross-validation.
- eazyml-data-quality: eazyml-data-quality provides APIs for comprehensive data quality assessment, including bias detection, outlier identification, and drift analysis for both data and models.
- eazyml-counterfactual: eazyml-counterfactual provides APIs for optimal prescriptive analytics, counterfactual explanations, and actionable insights to optimize predictive outcomes to align with your objectives.
- eazyml-insight: eazyml-insight provides APIs to discover patterns, generate insights, and mine rules from your datasets.
- eazyml-xai: eazyml-xai provides APIs for explainable AI (XAI), offering human-readable explanations, feature importance, and predictive reasoning.
- eazyml-xai-image: eazyml-xai-image provides APIs for image explainable AI (XAI).
License
This project is licensed under the Proprietary License.
Maintained by EazyML
© 2025 EazyML. All rights reserved.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file eazyml-data-quality-0.0.39.tar.gz.
File metadata
- Download URL: eazyml-data-quality-0.0.39.tar.gz
- Upload date:
- Size: 21.7 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dac4015e8073657dc70a3d7e6051d2b6f125d1955aad7d7b995f209bb3cb3c7b
|
|
| MD5 |
34eecc3aa9cffe8a4bd03243db730cda
|
|
| BLAKE2b-256 |
d1ff66659e2bac6d858c2beeabcd0a2e64aebc0ace448b82a9874bd166bf59cd
|
File details
Details for the file eazyml_data_quality-0.0.39-py2.py3-none-any.whl.
File metadata
- Download URL: eazyml_data_quality-0.0.39-py2.py3-none-any.whl
- Upload date:
- Size: 22.3 MB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
200fcefd65125e42e3938d848291d9c575a0488566e2a6075e43d23f8f5c53f8
|
|
| MD5 |
679f9b678f814c9bd2aa76dd81a5a132
|
|
| BLAKE2b-256 |
d1504f6ed9337d41a8497ce1784b8ea644f1ee1f0e0eaf76901ede808b2e2871
|