A robust, dataset-agnostic loader, cleaner, and automated interactive visual pairplot dashboard engine.
Project description
✨ ezclean-data
A premium, dataset-agnostic Python library designed to automate the painful parts of data loading, cleaning, and exploration. With ezclean-data, you can load any structured file format, sanitize outliers and null values, and instantly produce beautiful, interactive visualization dashboards or multi-variable pairplot matrices.
🚀 Features
- 📥 Smart Data Loader: Auto-detects extensions (CSV, Excel, Parquet, JSON, etc.) and routes them to optimized Pandas engines, streaming directly over HTTP or loading local paths.
- 🧼 Intelligent Data Cleaner: Standardizes columns to
snake_case, handles structural garbage strings, handles outliers via IQR boundary thresholds, and fills null values using type-specific heuristics (e.g. median for numbers). - 📊 Universal Plot Grid: Renders a generalized interactive Plotly pairplot matrix of subplots showing all possible univariate distributions (diagonal) and bivariate relationships (off-diagonal) for any dataset.
- 🎨 Standalone HTML Dashboard: Generates a fully interactive, lightweight dashboard with statistics cards, a column definitions table, and a dynamic JavaScript plot builder that works fully offline!
📦 Installation
Install ezclean-data directly from PyPI:
pip install ezclean-data
⚡ Quick Start
from ezclean import Smart_loader, Cleaner, colname, plot, plot_dashboard
# 1. Load your dataset from a file or url
df = Smart_loader("tested.csv")
# 2. Run the unified cleaning pipeline
df_cleaned = Cleaner(df)
# 3. Print column statistics
colname(df_cleaned)
# 4. Plot a single column (auto-detects types)
plot(df_cleaned, "survived")
# 5. Plot the generalized pairplot matrix (all combinations)
plot(df_cleaned)
# 6. Generate and open a gorgeous standalone HTML Dashboard
plot_dashboard(df_cleaned, filename="my_dashboard.html")
🛠️ Module API Overview
1. Smart_loader(file_path, **kwargs)
Instantly routes local or remote URLs to Pandas readers. Supported formats:
csv, tsv, txt, json, jsonl, ndjson, excel (xlsx, xls, ods), parquet, feather, arrow, orc, xml, html, pickle, stata, spss, sas, hdf.
2. Cleaner(df, ...)
High-level cleaning pipeline wrapping:
column_name_sanity(): Clean symbols, Deduplicate underscores, Convert tosnake_case.sanitize_data(): Replaces structural garbage (?,NULL,nil) with NumPy NaNs.text_normalization(): Trims whitespace and normalizes string fields.auto_type_correction(): Converts column dtypes to numeric if >50% of values match.intelligent_null_filling(): Median imputes numeric fields; fills categorical values with"Unknown".handle_outliers(): IQR-based outlier trimming.
3. plot(df, target_column=None, columns=None)
- If
target_columnis provided, renders a single visual (numerical gets Box+Histogram; categorical gets Donut+Bar; datetime gets Line Trend). - If
target_column=None, rendersplot_matrixcontaining all univariate and bivariate subplots for selected columns (default: top 5).
4. plot_dashboard(df, filename="ezclean_dashboard.html", show=True)
Writes a self-contained, interactive HTML dashboard containing:
- Complete column completeness summary tables.
- Dynamic Plotly client visualizer where users can build custom X vs Y plots.
- Embedded pairplot relation matrix.
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ezclean_data-0.1.0.tar.gz.
File metadata
- Download URL: ezclean_data-0.1.0.tar.gz
- Upload date:
- Size: 21.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
19e1be804e359fad04a50426f0d19348a296f0cfc217438470886fe08fffbb3c
|
|
| MD5 |
8c6ce5555cb767989fbfeb07bf46a77a
|
|
| BLAKE2b-256 |
8d959938277b2c09d81551ecd2f8e2d64398eb63002654344e08db6fcbfe8586
|
File details
Details for the file ezclean_data-0.1.0-py3-none-any.whl.
File metadata
- Download URL: ezclean_data-0.1.0-py3-none-any.whl
- Upload date:
- Size: 20.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ee5a0e2bc5b3477e54574c5ed65203a761f73d85c8dd49326230e18965f23f3c
|
|
| MD5 |
2f864d2cdc3c2f54a2030c97939e17de
|
|
| BLAKE2b-256 |
80924583af8ca3b34add963278f1ed12620375ab220e1195d222e223f2794e3c
|