profile tabular datasets, manage automatic validation for new datasets, automatic handling for quality issues.
Project description
[]
qprofiler
qprofiler is a Python package that provides an intelligent way to create a data quality profile for your development(train) dataset(s) and save it as a reference to use in creating quality check tests and automatic handling cases for production(test) datasets.
Table of Contents
Installation
The source code is currently hosted on GitHub at: dprofiler-github
Binary installers for the latest released version are available at the PyPi
# PyPi
pip install qprofiler
Dependencies
- Polars(>=0.19.0 <0.20.0)
- PyYAML(>=6.0.1 <7.0.0)
- Pathlib(>=1.0 <2.0)
- rumamel.yaml(>=0.17.32 <0.18.0)
Usage
check the notebook that contains everything about how to use DataProfiler module in profiling datasets, and how to use QTest module to create quality check tests.
check the notebook that contains everything about how to use QPipeline
Contributing
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Please make sure to update tests as appropriate.
Licence
New in v0.2.3
- Create Quality Pipeline(v0.2.2).
- enhance documentation.
- add utility method in DataProfiler to .dprofiler structure tree.
New in v0.3.0
- Modify file structure of .dprofiler.
- add Metadata Module of .dprofiler.
- modify test cases.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for qprofiler-0.3.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cb7c7eacf5ad7cb84aa3547738f0c65332512ecfad4aa4b4ed3ce4ddac617ce8 |
|
MD5 | 5c9b56e1661785d167a9c72de7d2d472 |
|
BLAKE2b-256 | d333eb13281565f62de21cf612dd6d9333f396f3872e1983789ed787ea02c423 |