Skip to main content

megaprofiler is a highly customizable and extensible data profiling library designed to help data scientists and engineers understand their datasets before performing analysis or building models.

Project description

When working with large datasets, it’s often necessary to understand data types, distributions, and potential issues (e.g., missing values, outliers) before analysis. While libraries like pandas-profiling exist, there is still room for an extensible, easy-to-use, and highly customizable profiler that integrates data validation.

Key Features:

  • Automatic Data Summaries: Provide insights like distribution, unique values, missing values, and more for each column.
  • Anomaly Detection: Automatically flag columns or rows with unusual distributions, outliers, or inconsistent data.
  • Data Validation: Set validation rules (e.g., no missing values in specific columns, data type constraints) and get alerts if the data violates these rules.
  • Custom Reports: Generate visual reports (e.g., HTML, PDF) with configurable thresholds for what counts as an anomaly.
  • Data Drift Detection: Track changes in data distributions over time to identify shifts in data quality or content.

Benefits: megaprofiler would be invaluable to data scientists and engineers dealing with exploratory data analysis, data quality checks, and ETL pipelines, reducing manual data investigation and improving data quality oversight.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

megaprofiler-0.2.1.tar.gz (8.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

megaprofiler-0.2.1-py3-none-any.whl (9.2 kB view details)

Uploaded Python 3

File details

Details for the file megaprofiler-0.2.1.tar.gz.

File metadata

  • Download URL: megaprofiler-0.2.1.tar.gz
  • Upload date:
  • Size: 8.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.6

File hashes

Hashes for megaprofiler-0.2.1.tar.gz
Algorithm Hash digest
SHA256 b512e3306b96887b746a31eb048f3f8dc5a762a36ca32565cd6b7306fb61fa82
MD5 da8605713fc5d8b219af70ab967ab224
BLAKE2b-256 8c5fda075d2752ac0ff1106e67854ea5fe4fa6e12076a3fd3747455af41298b0

See more details on using hashes here.

File details

Details for the file megaprofiler-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: megaprofiler-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 9.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.6

File hashes

Hashes for megaprofiler-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 da0034d7ea4ddc8453fe0e6cf96c3e6b080b8e030c33f00e6a258baba7d35b6d
MD5 2eb816f4d64d8fbe1c4d297dc5f70984
BLAKE2b-256 1a66b6921c312026e665081e354a5f4fd19f62108efe7fd517110df1fd28720b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page