Automated data cleaning tool
Project description
📦 NullPy
An Intelligent, Data-Aware Pandas DataFrame Cleaner 🚀
NullPy is a Python library for automatic, intelligent, and target-aware cleaning of pandas DataFrames.
It handles missing values, outliers, and predictive imputations using ML models when needed.
No more repetitive manual cleaning — NullPy decides the best strategy automatically.
✨ Features
-
🧹 Automatic Missing Value Handling
- Detects missing values.
- Decides best imputation strategy (
mean,median,mode,predictive). - Can train ML models (
RandomForest,LinearRegression,LogisticRegression) for predictive imputation.
-
📊 Outlier Detection & Handling
- Detects outliers using IQR method.
- Handles them via clip, drop, or predictive imputation.
-
🎯 Target-Aware Cleaning
- Uses correlation & chi-squared tests to decide when predictive cleaning is useful.
-
⚡ Highly Customizable
- Parameters for imputation strategy, outlier strategy, verbosity, and difference reporting.
-
🖥️ Beautiful Console Output (powered by Rich)
- Colorful progress bars.
- Summary cleaning reports.
- Difference reports (before vs after cleaning).
-
🔮 Demo Reports Included
- Quick one-call demo for showing cleaning in action.
🚀 Installation
pip install pandas numpy scikit-learn rich
(Or clone the repo and drop nullpy.py into your project.)
⚡ Quick Usage
1️⃣ Basic Cleaning
import pandas as pd
from nullpy import SmartDFCleaner
# Example Data
data = {
'Age': [25, 30, None, 22, 35, 45, 28, 33, None, 50, 150],
'Income': [50000, 60000, 58000, None, 72000, 68000, 52000, 61000, 59000, 75000, 80000],
'Gender': ['Male', 'Female', 'Male', 'Female', 'Male', None, 'Female', 'Male', 'Male', 'Female', 'Male'],
'City': ['NY', 'LA', 'NY', 'SF', 'LA', 'NY', 'SF', 'LA', 'NY', 'SF', 'LA'],
'Purchased': [0, 1, 1, 0, 1, 1, 0, 1, None, 1, 0]
}
df = pd.DataFrame(data)
# Run cleaner
cleaner = SmartDFCleaner(target_column="Purchased", show_difference=True)
cleaned_df = cleaner.fit_transform(df)
print(cleaned_df)
2️⃣ One-Call Full Demo (with auto + predictive cleaning reports)
from nullpy import SmartDFCleaner
newdf = SmartDFCleaner().clean_it(df, target_column="Purchased")
This will:
- Show original data.
- Show auto-cleaned DataFrame.
- Show predictive-cleaned DataFrame.
- Print summary reports + null counts.
⚙️ Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
target_column |
str | None | Target variable for correlation/predictive imputation |
impute_strategy |
str | auto |
auto, mean, median, mode, predictive |
outlier_strategy |
str | auto |
auto, clip, drop, predictive |
verbose |
bool | True | Print logs and progress |
show_difference |
bool | False | Show before/after difference report |
📊 Example Console Output
> Identified 2 numerical and 2 categorical features.
> 'Age' has high missing data (18.0%). Using simple imputation.
> Applied Median Imputation to column 'Age'.
> Clipped 1 outliers in column 'Age'.
> Cleaning process completed successfully!
🛠️ Methods
fit_transform(df)→ Returns cleaned DataFrame.demo_report(df, target_column, ...)→ Runs auto + predictive cleaning demo.clean_it(df, target_column, ...)→ One-call shortcut for full demo + final cleaned DF.
📌 Roadmap
- 🔜 Add support for time-series cleaning.
- 🔜 Add advanced outlier detection (Isolation Forest, Z-score).
- 🔜 Export cleaning logs to JSON/CSV.
👨💻 Author
Made with ❤️ and ☕ by Foresty (India 🇮🇳)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nullpy-0.0.2.tar.gz.
File metadata
- Download URL: nullpy-0.0.2.tar.gz
- Upload date:
- Size: 10.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6e66f3e347f2cfb027d4c04d01cb8b912e0ff93e9254c8396a5e3a83ce8061de
|
|
| MD5 |
486f60074ec4d356529d31db49ab1133
|
|
| BLAKE2b-256 |
0b087b740aee0ab24eba88573107593f39a4d6750a5f44bfe64b65618ca54e23
|
File details
Details for the file nullpy-0.0.2-py3-none-any.whl.
File metadata
- Download URL: nullpy-0.0.2-py3-none-any.whl
- Upload date:
- Size: 9.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a28f1f87e39db6efdf35f2292aee2fa7454ffd53ddb821c41c0099018bb3b477
|
|
| MD5 |
4d8dc5c01a3ff7183ac3856c49c08735
|
|
| BLAKE2b-256 |
009a06a8d2a3a72fba946ac7efd10ee876524fa5a888ec707b67ab9ab68099be
|