Skip to main content

eazyml-data-quality from EazyML family for comprehensive data quality assessment, including bias detection, outlier identification, and data drift analysis.

Project description

EazyML Responsible-AI: Data Quality Assessment

Python PyPI package Code Style

EazyML

Overview

eazyml-data-quality is a python utility designed to evaluate the quality of datasets by performing various checks such as data shape, emptiness, outlier detection, balance, and correlation. It helps users identify potential issues in their datasets and provides detailed feedback to ensure data readiness for downstream processes. It offers APIs for data quality assessment across multiple dimensions, including:

Features

  • Missing Value Analysis: Detect and impute missing values.
  • Bias Detection: Uncover and mitigate bias in datasets.
  • Data Drift and Model Drift Analysis: Monitor changes in data distributions over time.
  • Data Shape Quality: Validates dataset dimensions and checks if the number of rows is sufficient relative to the number of columns.
  • Data Emptiness Check: Identifies and reports missing values in the dataset.
  • Outlier Detection: Detects and removes outliers based on statistical analysis.
  • Data Balance Check: Analyzes the balance of the dataset and computes a balance score.
  • Correlation Analysis: Identify multicollinearity, relationships between features and provides alerts for highly correlated features.
  • Summary Alerts: Consolidates key quality issues into a single summary for quick review.

With eazyml-data-quality, you can ensure that your training data is clean, balanced, and ready for machine learning.

Installation

To use the Data Quality Checker, ensure you have Python installed on your system.

User installation

The easiest way to install data quality is using pip:

pip install -U eazyml-data-quality

Dependencies

This package requires:

  • pandas
  • scikit-learn
  • numpy
  • openpyxl
  • eazyml-insight

Usage

Here's an example of how you can use the APIs from this package.

Imports

from eazyml_data_quality import ez_data_quality

Initialize and Read Data

# Initialize the EazyML automl library.
_ = ez_init()

# Define training data (Replace with the correct data path).
train_data_path = "path_to_your_training_data.csv"

# Define test data (Replace with the correct data path).
test_data_path = "path_to_your_test_data.csv"

Perform Data Quality

# Define the outcome (target variable)
outcome = "target"  # Replace with your target variable name

# Customize options to perform data quality
dqa_options = {
               "data_shape": "yes",
               "data_balance": "yes",
               "data_emptiness": "yes",
               "data_outliers": "yes",
               "remove_outliers": "yes",
               "outcome_correlation": "yes",
               "data_drift": "yes",
               "model_drift": "yes",
               "prediction_data": test_data_path,
               "data_completeness": "yes",
               "data_correctness": "yes"
              }

# Call the EazyML APIs to perform data quality
dqa_response = ez_data_quality(train_data_path, outcome, options=dqa_options)

# dqa_response is a dictionary object with following keys.
# print(dqa_response.keys())
# dict_keys(['success', 'message', 'data_shape_quality', 'data_emptiness_quality', 'data_outliers_quality', 'data_balance_quality', 'data_correlation_quality', 'data_completeness_quality', 'data_correctness_quality', 'drift_quality', 'data_bad_quality_alerts'])

# the response object contains a dictionary with the results of all data quality checks, along with the data quality alerts selected by the user.

You can find more information in the documentation.

Useful links, other packages from EazyML family

  • Documentation

  • Homepage

  • If you have questions or would like to discuss a use case, please contact us here

  • Here are the other packages from EazyML suite:

    • eazyml-automl: eazyml-automl provides a suite of APIs for training, optimizing and validating machine learning models with built-in AutoML capabilities, hyperparameter tuning, and cross-validation.
    • eazyml-data-quality: eazyml-data-quality provides APIs for comprehensive data quality assessment, including bias detection, outlier identification, and drift analysis for both data and models.
    • eazyml-counterfactual: eazyml-counterfactual provides APIs for optimal prescriptive analytics, counterfactual explanations, and actionable insights to optimize predictive outcomes to align with your objectives.
    • eazyml-insight: eazyml-insight provides APIs to discover patterns, generate insights, and mine rules from your datasets.
    • eazyml-xai: eazyml-xai provides APIs for explainable AI (XAI), offering human-readable explanations, feature importance, and predictive reasoning.
    • eazyml-xai-image: eazyml-xai-image provides APIs for image explainable AI (XAI).

License

This project is licensed under the Proprietary License.


Maintained by EazyML
© 2025 EazyML. All rights reserved.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eazyml-data-quality-0.0.39.tar.gz (21.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

eazyml_data_quality-0.0.39-py2.py3-none-any.whl (22.3 MB view details)

Uploaded Python 2Python 3

File details

Details for the file eazyml-data-quality-0.0.39.tar.gz.

File metadata

  • Download URL: eazyml-data-quality-0.0.39.tar.gz
  • Upload date:
  • Size: 21.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for eazyml-data-quality-0.0.39.tar.gz
Algorithm Hash digest
SHA256 dac4015e8073657dc70a3d7e6051d2b6f125d1955aad7d7b995f209bb3cb3c7b
MD5 34eecc3aa9cffe8a4bd03243db730cda
BLAKE2b-256 d1ff66659e2bac6d858c2beeabcd0a2e64aebc0ace448b82a9874bd166bf59cd

See more details on using hashes here.

File details

Details for the file eazyml_data_quality-0.0.39-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for eazyml_data_quality-0.0.39-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 200fcefd65125e42e3938d848291d9c575a0488566e2a6075e43d23f8f5c53f8
MD5 679f9b678f814c9bd2aa76dd81a5a132
BLAKE2b-256 d1504f6ed9337d41a8497ce1784b8ea644f1ee1f0e0eaf76901ede808b2e2871

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page