megaprofiler is a highly customizable and extensible data profiling library designed to help data scientists and engineers understand their datasets before performing analysis or building models.
Project description
When working with large datasets, it’s often necessary to understand data types, distributions, and potential issues (e.g., missing values, outliers) before analysis. While libraries like pandas-profiling exist, there is still room for an extensible, easy-to-use, and highly customizable profiler that integrates data validation.
Key Features:
- Automatic Data Summaries: Provide insights like distribution, unique values, missing values, and more for each column.
- Anomaly Detection: Automatically flag columns or rows with unusual distributions, outliers, or inconsistent data.
- Data Validation: Set validation rules (e.g., no missing values in specific columns, data type constraints) and get alerts if the data violates these rules.
- Custom Reports: Generate visual reports (e.g., HTML, PDF) with configurable thresholds for what counts as an anomaly.
- Data Drift Detection: Track changes in data distributions over time to identify shifts in data quality or content.
Benefits: megaprofiler would be invaluable to data scientists and engineers dealing with exploratory data analysis, data quality checks, and ETL pipelines, reducing manual data investigation and improving data quality oversight.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for megaprofiler-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ff532e19b19cdc9d0dd86f6b8b6690d54b257d87646882464b5e966ef7784e27 |
|
MD5 | 28b5aa25b318f250515f4c620ecd5908 |
|
BLAKE2b-256 | 6449482b475983fef09c4eb4c0341818ac39986a2eec5452aff53b26df41dc37 |