Skip to main content

megaprofiler is a highly customizable and extensible data profiling library designed to help data scientists and engineers understand their datasets before performing analysis or building models.

Project description

Downloads

When working with large datasets, it’s often necessary to understand data types, distributions, and potential issues (e.g., missing values, outliers) before analysis. While libraries like pandas-profiling exist, there is still room for an extensible, easy-to-use, and highly customizable profiler that integrates data validation.

Key Features: Automatic Data Summaries: Provide insights like distribution, unique values, missing values, and more for each column. Anomaly Detection: Automatically flag columns or rows with unusual distributions, outliers, or inconsistent data. Data Validation: Set validation rules (e.g., no missing values in specific columns, data type constraints) and get alerts if the data violates these rules. Custom Reports: Generate visual reports (e.g., HTML, PDF) with configurable thresholds for what counts as an anomaly. Data Drift Detection: Track changes in data distributions over time to identify shifts in data quality or content. Benefits: DataProfiler would be invaluable to data scientists and engineers dealing with exploratory data analysis, data quality checks, and ETL pipelines, reducing manual data investigation.

To Use :

'from megaprofiler import MegaProfiler'

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

megaprofiler-0.2.2.tar.gz (8.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

megaprofiler-0.2.2-py3-none-any.whl (9.2 kB view details)

Uploaded Python 3

File details

Details for the file megaprofiler-0.2.2.tar.gz.

File metadata

  • Download URL: megaprofiler-0.2.2.tar.gz
  • Upload date:
  • Size: 8.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.6

File hashes

Hashes for megaprofiler-0.2.2.tar.gz
Algorithm Hash digest
SHA256 d01f686d2b3efc5732cead40c2d002bf5c36d766f3547aabe6e3d749908ab1bf
MD5 973c9178c382d410c6bc145cb1020c8c
BLAKE2b-256 5b663045c9e497aba5d8afaad37e9cab94a142132dd0850107135e0cad89618a

See more details on using hashes here.

File details

Details for the file megaprofiler-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: megaprofiler-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 9.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.6

File hashes

Hashes for megaprofiler-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 4fa2b5820a0e05a1672da72d3fe6d0aa358ca32b656bca504220818fe46e7987
MD5 a19611cb37e7c045ed29b080062a8edb
BLAKE2b-256 dfef49c32fc40aeb4b017e8b3cae82cdc851a652580c747a3e2aed0328e425fc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page