A robust, dataset-agnostic loader, cleaner, and automated interactive visual pairplot dashboard engine.
Project description
ezclean-data
A comprehensive, dataset-agnostic Python library for automated data ingestion, cleaning, preprocessing, and exploratory data analysis.
ezclean-data streamlines the repetitive tasks involved in preparing datasets by providing intelligent loading, automated sanitization, statistical summaries, advanced visualizations, and standalone interactive dashboards.
Overview
Data preparation often consumes a significant portion of the data science workflow. ezclean-data provides a unified interface that automatically loads structured datasets, cleans inconsistencies, handles missing values and outliers, standardizes column names, and generates insightful visualizations with minimal code.
The library is designed to work across a wide variety of dataset formats and structures, making it suitable for students, researchers, analysts, and machine learning practitioners.
Features
Smart Data Loading
- Automatically detects file formats and selects the appropriate Pandas engine.
- Supports both local files and remote URLs.
- Handles a broad range of structured data formats.
Automated Data Cleaning
- Standardizes column names into consistent
snake_caseformat. - Detects and replaces common invalid placeholder values.
- Performs automatic type correction where appropriate.
- Handles missing values using intelligent, data-type-aware strategies.
- Detects and mitigates outliers using the Interquartile Range (IQR) method.
Exploratory Data Analysis
- Generates detailed column summaries and completeness statistics.
- Provides automatic visualizations based on column data types.
- Creates generalized pairplot-style relationship matrices for rapid exploration.
Interactive Dashboard Generation
- Produces self-contained HTML dashboards.
- Includes summary statistics and data quality metrics.
- Provides interactive Plotly-based visualizations.
- Works entirely offline once generated.
Installation
Install directly from PyPI:
pip install ezclean-data
Quick Start
from ezclean import Smart_loader, Cleaner, colname, plot, plot_dashboard
# Load dataset
df = Smart_loader("tested.csv")
# Execute cleaning pipeline
df_cleaned = Cleaner(df)
# Display column statistics
colname(df_cleaned)
# Visualize a single column
plot(df_cleaned, "survived")
# Generate a relationship matrix
plot(df_cleaned)
# Create an interactive dashboard
plot_dashboard(
df_cleaned,
filename="my_dashboard.html"
)
API Reference
Smart_loader()
Smart_loader(file_path, **kwargs)
Automatically loads structured datasets from local storage or remote URLs.
Supported Formats
| Category | Formats |
|---|---|
| Text Files | CSV, TSV, TXT |
| JSON Formats | JSON, JSONL, NDJSON |
| Spreadsheet Files | XLSX, XLS, ODS |
| Columnar Formats | Parquet, Feather, Arrow, ORC |
| Statistical Formats | SPSS, SAS, Stata |
| Other Formats | XML, HTML, Pickle, HDF |
Cleaner()
Cleaner(df, ...)
Executes a complete data-cleaning pipeline.
Included Operations
Column Name Standardization
- Converts names to
snake_case - Removes special characters
- Eliminates duplicate separators
Data Sanitization
Replaces common placeholder values such as:
?
NULL
null
nil
N/A
NaN
with proper missing-value representations.
Text Normalization
- Trims whitespace
- Standardizes string formatting
Automatic Type Detection
- Converts columns to numeric types when appropriate
- Preserves incompatible values
Missing Value Handling
- Numerical columns → Median Imputation
- Categorical columns →
"Unknown"Replacement
Outlier Treatment
- Uses Interquartile Range (IQR) thresholds
- Removes extreme observations automatically
colname()
colname(df)
Displays detailed metadata for each column, including:
- Data type
- Missing value count
- Completeness percentage
- Unique value count
- Statistical summaries
plot()
plot(df, target_column=None, columns=None)
Single-Column Visualization
When a target column is specified, the visualization is selected automatically based on data type.
| Data Type | Visualization |
|---|---|
| Numeric | Histogram + Box Plot |
| Categorical | Bar Chart + Donut Chart |
| Datetime | Trend Line |
Relationship Matrix
plot(df)
Generates a generalized pairplot matrix displaying:
- Univariate distributions
- Correlation patterns
- Relationships between variables
plot_dashboard()
plot_dashboard(
df,
filename="ezclean_dashboard.html",
show=True
)
Creates a standalone interactive dashboard containing:
Dataset Summary
- Dataset dimensions
- Completeness metrics
- Data quality statistics
Column Analysis
- Data types
- Missing values
- Unique value counts
Interactive Visualization Builder
Users can dynamically select:
- X-axis variables
- Y-axis variables
- Plot types
without writing additional code.
Relationship Matrix
Embedded Plotly-based exploratory visualization for multivariate analysis.
Example Workflow
from ezclean import *
df = Smart_loader("data.csv")
df = Cleaner(df)
colname(df)
plot(df, "age")
plot(df)
plot_dashboard(
df,
filename="dashboard.html"
)
Use Cases
- Data Science Projects
- Machine Learning Preprocessing
- Academic Research
- Exploratory Data Analysis
- Business Intelligence Reporting
- Rapid Dataset Validation
- Educational Applications
Why ezclean-data?
Most data analysis projects begin with repetitive preprocessing tasks such as loading files, cleaning columns, handling missing values, detecting outliers, and creating exploratory visualizations.
ezclean-data consolidates these operations into a simple and consistent workflow, allowing users to focus on analysis and model development rather than boilerplate data preparation code.
License
This project is licensed under the MIT License.
See the LICENSE file for complete licensing information.
Author
Developed and maintained by Thilac Ramesh.
Contributions, feature requests, and issue reports are welcome.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ezclean-0.1.0.tar.gz.
File metadata
- Download URL: ezclean-0.1.0.tar.gz
- Upload date:
- Size: 23.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d47e6ef57859487444305fcd79f4f8240d79de4cb273bcd2276d13c33006bedc
|
|
| MD5 |
4b5a451baeed25d5807cdf5f0723c88f
|
|
| BLAKE2b-256 |
f41bb3d7f70b5bdf5dc0f647b686c50c58c59643faebabb83790bbf3afe4e7d5
|
File details
Details for the file ezclean-0.1.0-py3-none-any.whl.
File metadata
- Download URL: ezclean-0.1.0-py3-none-any.whl
- Upload date:
- Size: 19.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6adc98e1a6b0530c42abc776db6e1f6d471a1b6a33feb86fa47ae4c90a66f39d
|
|
| MD5 |
63f69391267a933ce6d9d1643b2810ac
|
|
| BLAKE2b-256 |
e15d57e99638dc2c891f1e1dd9c9be6df0eafd317b50e911998d00f3b443a976
|