Skip to main content

A robust, dataset-agnostic loader, cleaner, and automated interactive visual pairplot dashboard engine.

Project description

✨ ezclean-data

PyPI Version License: MIT Python versions

A premium, dataset-agnostic Python library designed to automate the painful parts of data loading, cleaning, and exploration. With ezclean-data, you can load any structured file format, sanitize outliers and null values, and instantly produce beautiful, interactive visualization dashboards or multi-variable pairplot matrices.


🚀 Features

  • 📥 Smart Data Loader: Auto-detects extensions (CSV, Excel, Parquet, JSON, etc.) and routes them to optimized Pandas engines, streaming directly over HTTP or loading local paths.
  • 🧼 Intelligent Data Cleaner: Standardizes columns to snake_case, handles structural garbage strings, handles outliers via IQR boundary thresholds, and fills null values using type-specific heuristics (e.g. median for numbers).
  • 📊 Universal Plot Grid: Renders a generalized interactive Plotly pairplot matrix of subplots showing all possible univariate distributions (diagonal) and bivariate relationships (off-diagonal) for any dataset.
  • 🎨 Standalone HTML Dashboard: Generates a fully interactive, lightweight dashboard with statistics cards, a column definitions table, and a dynamic JavaScript plot builder that works fully offline!

📦 Installation

Install ezclean-data directly from PyPI:

pip install ezclean-data

⚡ Quick Start

from ezclean import Smart_loader, Cleaner, colname, plot, plot_dashboard

# 1. Load your dataset from a file or url
df = Smart_loader("tested.csv")

# 2. Run the unified cleaning pipeline
df_cleaned = Cleaner(df)

# 3. Print column statistics
colname(df_cleaned)

# 4. Plot a single column (auto-detects types)
plot(df_cleaned, "survived")

# 5. Plot the generalized pairplot matrix (all combinations)
plot(df_cleaned)

# 6. Generate and open a gorgeous standalone HTML Dashboard
plot_dashboard(df_cleaned, filename="my_dashboard.html")

🛠️ Module API Overview

1. Smart_loader(file_path, **kwargs)

Instantly routes local or remote URLs to Pandas readers. Supported formats: csv, tsv, txt, json, jsonl, ndjson, excel (xlsx, xls, ods), parquet, feather, arrow, orc, xml, html, pickle, stata, spss, sas, hdf.

2. Cleaner(df, ...)

High-level cleaning pipeline wrapping:

  • column_name_sanity(): Clean symbols, Deduplicate underscores, Convert to snake_case.
  • sanitize_data(): Replaces structural garbage (?, NULL, nil) with NumPy NaNs.
  • text_normalization(): Trims whitespace and normalizes string fields.
  • auto_type_correction(): Converts column dtypes to numeric if >50% of values match.
  • intelligent_null_filling(): Median imputes numeric fields; fills categorical values with "Unknown".
  • handle_outliers(): IQR-based outlier trimming.

3. plot(df, target_column=None, columns=None)

  • If target_column is provided, renders a single visual (numerical gets Box+Histogram; categorical gets Donut+Bar; datetime gets Line Trend).
  • If target_column=None, renders plot_matrix containing all univariate and bivariate subplots for selected columns (default: top 5).

4. plot_dashboard(df, filename="ezclean_dashboard.html", show=True)

Writes a self-contained, interactive HTML dashboard containing:

  1. Complete column completeness summary tables.
  2. Dynamic Plotly client visualizer where users can build custom X vs Y plots.
  3. Embedded pairplot relation matrix.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ezclean_data-0.1.0.tar.gz (21.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ezclean_data-0.1.0-py3-none-any.whl (20.7 kB view details)

Uploaded Python 3

File details

Details for the file ezclean_data-0.1.0.tar.gz.

File metadata

  • Download URL: ezclean_data-0.1.0.tar.gz
  • Upload date:
  • Size: 21.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for ezclean_data-0.1.0.tar.gz
Algorithm Hash digest
SHA256 19e1be804e359fad04a50426f0d19348a296f0cfc217438470886fe08fffbb3c
MD5 8c6ce5555cb767989fbfeb07bf46a77a
BLAKE2b-256 8d959938277b2c09d81551ecd2f8e2d64398eb63002654344e08db6fcbfe8586

See more details on using hashes here.

File details

Details for the file ezclean_data-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: ezclean_data-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 20.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for ezclean_data-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ee5a0e2bc5b3477e54574c5ed65203a761f73d85c8dd49326230e18965f23f3c
MD5 2f864d2cdc3c2f54a2030c97939e17de
BLAKE2b-256 80924583af8ca3b34add963278f1ed12620375ab220e1195d222e223f2794e3c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page