Skip to main content

megaprofiler is a highly customizable and extensible data profiling library designed to help data scientists and engineers understand their datasets before performing analysis or building models.

Project description

When working with large datasets, it’s often necessary to understand data types, distributions, and potential issues (e.g., missing values, outliers) before analysis. While libraries like pandas-profiling exist, there is still room for an extensible, easy-to-use, and highly customizable profiler that integrates data validation.

Key Features:

  • Automatic Data Summaries: Provide insights like distribution, unique values, missing values, and more for each column.
  • Anomaly Detection: Automatically flag columns or rows with unusual distributions, outliers, or inconsistent data.
  • Data Validation: Set validation rules (e.g., no missing values in specific columns, data type constraints) and get alerts if the data violates these rules.
  • Custom Reports: Generate visual reports (e.g., HTML, PDF) with configurable thresholds for what counts as an anomaly.
  • Data Drift Detection: Track changes in data distributions over time to identify shifts in data quality or content.

Benefits: megaprofiler would be invaluable to data scientists and engineers dealing with exploratory data analysis, data quality checks, and ETL pipelines, reducing manual data investigation and improving data quality oversight.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

megaprofiler-0.2.0.tar.gz (8.4 kB view hashes)

Uploaded Source

Built Distribution

megaprofiler-0.2.0-py3-none-any.whl (10.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page