YData open-source tools for Data Quality.

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

ydata

These details have not been verified by PyPI

Project description

Data Quality

data_quality is an open-source python library for assessing Data Quality throughout the multiple stages of a data pipeline development.

A holistic view of the data can only be captured through a look at data from multiple dimensions and data_quality evaluates it in a modular way wrapped into a single Data Quality engine. This repository contains the core python source scripts and walkthrough tutorials.

Quickstart

The source code is currently hosted on GitHub at: https://github.com/Data-Centric-AI-Community/fg-data-quality

Binary installers for the latest released version are available at the Python Package Index (PyPI).

pip install fg-data-quality

Comprehensive quality check in few lines of code

from data_quality import DataQuality
import pandas as pd

#Load in the data
df = pd.read_csv('./datasets/transformed/census_10k.csv')

# create a DataQuality object from the main class that holds all quality modules
dq = DataQuality(df=df)

# run the tests and outputs a summary of the quality tests
results = dq.evaluate()

Warnings:
	TOTAL: 5 warning(s)
	Priority 1: 1 warning(s)
	Priority 2: 4 warning(s)

Priority 1 - heavy impact expected:
	* [DUPLICATES - DUPLICATE COLUMNS] Found 1 columns with exactly the same feature values as other columns.
Priority 2 - usage allowed, limited human intelligibility:
	* [DATA RELATIONS - HIGH COLLINEARITY - NUMERICAL] Found 3 numerical variables with high Variance Inflation Factor (VIF>5.0). The variables listed in results are highly collinear with other variables in the dataset. These will make model explainability harder and potentially give way to issues like overfitting. Depending on your end goal you might want to remove the highest VIF variables.
	* [ERRONEOUS DATA - PREDEFINED ERRONEOUS DATA] Found 1960 ED values in the dataset.
	* [DATA RELATIONS - HIGH COLLINEARITY - CATEGORICAL] Found 10 categorical variables with significant collinearity (p-value < 0.05). The variables listed in results are highly collinear with other variables in the dataset and sorted descending according to propensity. These will make model explainability harder and potentially give way to issues like overfitting. Depending on your end goal you might want to remove variables following the provided order.
	* [DUPLICATES - EXACT DUPLICATES] Found 3 instances with exact duplicate feature values.

On top of the summary, you can retrieve a list of detected warnings for detailed inspection.

# retrieve a list of data quality warnings 
warnings = dq.get_warnings()

Migration Guide

1. Uninstall the old package

pip uninstall ydata-quality

2. Install the new package

pip install fg-data-quality

3. Update your imports

Find and replace all occurrences of the old import in your codebase:

# Before
import ydata_quality
from data_quality import DataQuality

# After
import data_quality
from data_quality import DataQuality

You can use this one-liner to find all affected files:

grep -r "ydata_quality" . --include="*.py"

Examples

Here you can find walkthrough tutorials and examples to familiarize with different modules of data_quality

Start Here for Quick and Overall Walkthrough

To dive into any focussed module, and to understand how they work, here are tutorial notebooks:

Contributing

We are open to collaboration! If you want to start contributing you only need to:

Search for an issue in which you would like to work on. Issues for newcomers are labeled with good first issue.
Create a PR solving the issue.
We would review every PR and either accept or ask for revisions.

You can also join the discussions on our Discord Community and request features/bug fixes by opening issues on our repository.

Support

For support in using this library, please join our Discord server. The Discord community is very friendly and great about quickly answering questions about the use and development of the library. Click here to join our Discord community!

License

GNU General Public License v3.0

About

With ♥️ from YData Development team

Project details

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

ydata

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.2.0

Apr 23, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fg_data_quality-0.2.0.tar.gz (49.0 kB view details)

Uploaded Apr 23, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

fg_data_quality-0.2.0-py2.py3-none-any.whl (56.1 kB view details)

Uploaded Apr 23, 2026 Python 2Python 3

File details

Details for the file fg_data_quality-0.2.0.tar.gz.

File metadata

Download URL: fg_data_quality-0.2.0.tar.gz
Upload date: Apr 23, 2026
Size: 49.0 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for fg_data_quality-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`e9ac2c84302cfac428c8b467ec28f4bd995f3a19fa19ff231bd22a2f2a1ca49a`
MD5	`d0fc2254f5816a7324b1c1186cf4d596`
BLAKE2b-256	`053bc27c7bb1189b8859571259aa60c2d9947b68a73af793fd0f2ac2d52a2804`

See more details on using hashes here.

Provenance

The following attestation bundles were made for fg_data_quality-0.2.0.tar.gz:

Publisher: release.yml on Data-Centric-AI-Community/fg-data-quality

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: fg_data_quality-0.2.0.tar.gz
- Subject digest: e9ac2c84302cfac428c8b467ec28f4bd995f3a19fa19ff231bd22a2f2a1ca49a
- Sigstore transparency entry: 1361636322
- Sigstore integration time: Apr 23, 2026
Source repository:
- Permalink: Data-Centric-AI-Community/fg-data-quality@41e21a8810bc035aedf6a1180bad9f0fe421407e
- Branch / Tag: refs/tags/0.2.0
- Owner: https://github.com/Data-Centric-AI-Community
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@41e21a8810bc035aedf6a1180bad9f0fe421407e
- Trigger Event: release

File details

Details for the file fg_data_quality-0.2.0-py2.py3-none-any.whl.

File metadata

Download URL: fg_data_quality-0.2.0-py2.py3-none-any.whl
Upload date: Apr 23, 2026
Size: 56.1 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for fg_data_quality-0.2.0-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`700e63f5cd4ea1241275ef9a35c7cf20e6c480449281cdc7677cea44ce35814a`
MD5	`e1307b043f00bdf74ca74f928b670c0c`
BLAKE2b-256	`16d39a10a6aa1b05a3da7d0b00effd3c975f7f145dbf569e89575a21e2dbbec4`

See more details on using hashes here.

Provenance

The following attestation bundles were made for fg_data_quality-0.2.0-py2.py3-none-any.whl:

Publisher: release.yml on Data-Centric-AI-Community/fg-data-quality

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: fg_data_quality-0.2.0-py2.py3-none-any.whl
- Subject digest: 700e63f5cd4ea1241275ef9a35c7cf20e6c480449281cdc7677cea44ce35814a
- Sigstore transparency entry: 1361636356
- Sigstore integration time: Apr 23, 2026
Source repository:
- Permalink: Data-Centric-AI-Community/fg-data-quality@41e21a8810bc035aedf6a1180bad9f0fe421407e
- Branch / Tag: refs/tags/0.2.0
- Owner: https://github.com/Data-Centric-AI-Community
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@41e21a8810bc035aedf6a1180bad9f0fe421407e
- Trigger Event: release

fg-data-quality 0.2.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Data Quality

Quickstart

Comprehensive quality check in few lines of code

Migration Guide

1. Uninstall the old package

2. Install the new package

3. Update your imports

Examples

Contributing

Support

License

About

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance