Skip to main content

A robust, dataset-agnostic loader, cleaner, and automated interactive visual pairplot dashboard engine.

Project description

ezclean-data

PyPI Version License: MIT Python Versions

A comprehensive, dataset-agnostic Python library for automated data ingestion, cleaning, preprocessing, and exploratory data analysis.

ezclean-data streamlines the repetitive tasks involved in preparing datasets by providing intelligent loading, automated sanitization, statistical summaries, advanced visualizations, and standalone interactive dashboards.


Overview

Data preparation often consumes a significant portion of the data science workflow. ezclean-data provides a unified interface that automatically loads structured datasets, cleans inconsistencies, handles missing values and outliers, standardizes column names, and generates insightful visualizations with minimal code.

The library is designed to work across a wide variety of dataset formats and structures, making it suitable for students, researchers, analysts, and machine learning practitioners.


Features

Smart Data Loading

  • Automatically detects file formats and selects the appropriate Pandas engine.
  • Supports both local files and remote URLs.
  • Handles a broad range of structured data formats.

Automated Data Cleaning

  • Standardizes column names into consistent snake_case format.
  • Detects and replaces common invalid placeholder values.
  • Performs automatic type correction where appropriate.
  • Handles missing values using intelligent, data-type-aware strategies.
  • Detects and mitigates outliers using the Interquartile Range (IQR) method.

Exploratory Data Analysis

  • Generates detailed column summaries and completeness statistics.
  • Provides automatic visualizations based on column data types.
  • Creates generalized pairplot-style relationship matrices for rapid exploration.

Interactive Dashboard Generation

  • Produces self-contained HTML dashboards.
  • Includes summary statistics and data quality metrics.
  • Provides interactive Plotly-based visualizations.
  • Works entirely offline once generated.

Installation

Install directly from PyPI:

pip install ezclean-data

Quick Start

from ezclean import Smart_loader, Cleaner, colname, plot, plot_dashboard

# Load dataset
df = Smart_loader("tested.csv")

# Execute cleaning pipeline
df_cleaned = Cleaner(df)

# Display column statistics
colname(df_cleaned)

# Visualize a single column
plot(df_cleaned, "survived")

# Generate a relationship matrix
plot(df_cleaned)

# Create an interactive dashboard
plot_dashboard(
    df_cleaned,
    filename="my_dashboard.html"
)

API Reference

Smart_loader()

Smart_loader(file_path, **kwargs)

Automatically loads structured datasets from local storage or remote URLs.

Supported Formats

Category Formats
Text Files CSV, TSV, TXT
JSON Formats JSON, JSONL, NDJSON
Spreadsheet Files XLSX, XLS, ODS
Columnar Formats Parquet, Feather, Arrow, ORC
Statistical Formats SPSS, SAS, Stata
Other Formats XML, HTML, Pickle, HDF

Cleaner()

Cleaner(df, ...)

Executes a complete data-cleaning pipeline.

Included Operations

Column Name Standardization

  • Converts names to snake_case
  • Removes special characters
  • Eliminates duplicate separators

Data Sanitization

Replaces common placeholder values such as:

?
NULL
null
nil
N/A
NaN

with proper missing-value representations.

Text Normalization

  • Trims whitespace
  • Standardizes string formatting

Automatic Type Detection

  • Converts columns to numeric types when appropriate
  • Preserves incompatible values

Missing Value Handling

  • Numerical columns → Median Imputation
  • Categorical columns → "Unknown" Replacement

Outlier Treatment

  • Uses Interquartile Range (IQR) thresholds
  • Removes extreme observations automatically

colname()

colname(df)

Displays detailed metadata for each column, including:

  • Data type
  • Missing value count
  • Completeness percentage
  • Unique value count
  • Statistical summaries

plot()

plot(df, target_column=None, columns=None)

Single-Column Visualization

When a target column is specified, the visualization is selected automatically based on data type.

Data Type Visualization
Numeric Histogram + Box Plot
Categorical Bar Chart + Donut Chart
Datetime Trend Line

Relationship Matrix

plot(df)

Generates a generalized pairplot matrix displaying:

  • Univariate distributions
  • Correlation patterns
  • Relationships between variables

plot_dashboard()

plot_dashboard(
    df,
    filename="ezclean_dashboard.html",
    show=True
)

Creates a standalone interactive dashboard containing:

Dataset Summary

  • Dataset dimensions
  • Completeness metrics
  • Data quality statistics

Column Analysis

  • Data types
  • Missing values
  • Unique value counts

Interactive Visualization Builder

Users can dynamically select:

  • X-axis variables
  • Y-axis variables
  • Plot types

without writing additional code.

Relationship Matrix

Embedded Plotly-based exploratory visualization for multivariate analysis.


Example Workflow

from ezclean import *

df = Smart_loader("data.csv")

df = Cleaner(df)

colname(df)

plot(df, "age")

plot(df)

plot_dashboard(
    df,
    filename="dashboard.html"
)

Use Cases

  • Data Science Projects
  • Machine Learning Preprocessing
  • Academic Research
  • Exploratory Data Analysis
  • Business Intelligence Reporting
  • Rapid Dataset Validation
  • Educational Applications

Why ezclean-data?

Most data analysis projects begin with repetitive preprocessing tasks such as loading files, cleaning columns, handling missing values, detecting outliers, and creating exploratory visualizations.

ezclean-data consolidates these operations into a simple and consistent workflow, allowing users to focus on analysis and model development rather than boilerplate data preparation code.


License

This project is licensed under the MIT License.

See the LICENSE file for complete licensing information.


Author

Developed and maintained by Thilac Ramesh.

Contributions, feature requests, and issue reports are welcome.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ezclean-0.1.0.tar.gz (23.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ezclean-0.1.0-py3-none-any.whl (19.1 kB view details)

Uploaded Python 3

File details

Details for the file ezclean-0.1.0.tar.gz.

File metadata

  • Download URL: ezclean-0.1.0.tar.gz
  • Upload date:
  • Size: 23.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for ezclean-0.1.0.tar.gz
Algorithm Hash digest
SHA256 d47e6ef57859487444305fcd79f4f8240d79de4cb273bcd2276d13c33006bedc
MD5 4b5a451baeed25d5807cdf5f0723c88f
BLAKE2b-256 f41bb3d7f70b5bdf5dc0f647b686c50c58c59643faebabb83790bbf3afe4e7d5

See more details on using hashes here.

File details

Details for the file ezclean-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: ezclean-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 19.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for ezclean-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6adc98e1a6b0530c42abc776db6e1f6d471a1b6a33feb86fa47ae4c90a66f39d
MD5 63f69391267a933ce6d9d1643b2810ac
BLAKE2b-256 e15d57e99638dc2c891f1e1dd9c9be6df0eafd317b50e911998d00f3b443a976

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page