Skip to main content

Automated data cleaning tool

Project description

📦 NullPy

An Intelligent, Data-Aware Pandas DataFrame Cleaner 🚀

NullPy is a Python library for automatic, intelligent, and target-aware cleaning of pandas DataFrames. It handles missing values, outliers, and predictive imputations using ML models when needed. No more repetitive manual cleaning — NullPy decides the best strategy automatically.


✨ Features

  • 🧹 Automatic Missing Value Handling

    • Detects missing values.
    • Decides best imputation strategy (mean, median, mode, predictive).
    • Can train ML models (RandomForest, LinearRegression, LogisticRegression) for predictive imputation.
  • 📊 Outlier Detection & Handling

    • Detects outliers using IQR method.
    • Handles them via clip, drop, or predictive imputation.
  • 🎯 Target-Aware Cleaning

    • Uses correlation & chi-squared tests to decide when predictive cleaning is useful.
  • Highly Customizable

    • Parameters for imputation strategy, outlier strategy, verbosity, and difference reporting.
  • 🖥️ Beautiful Console Output (powered by Rich)

    • Colorful progress bars.
    • Summary cleaning reports.
    • Difference reports (before vs after cleaning).
  • 🔮 Demo Reports Included

    • Quick one-call demo for showing cleaning in action.

🚀 Installation

pip install pandas numpy scikit-learn rich

(Or clone the repo and drop nullpy.py into your project.)


⚡ Quick Usage

1️⃣ Basic Cleaning

import pandas as pd
from nullpy import SmartDFCleaner

# Example Data
data = {
    'Age': [25, 30, None, 22, 35, 45, 28, 33, None, 50, 150],
    'Income': [50000, 60000, 58000, None, 72000, 68000, 52000, 61000, 59000, 75000, 80000],
    'Gender': ['Male', 'Female', 'Male', 'Female', 'Male', None, 'Female', 'Male', 'Male', 'Female', 'Male'],
    'City': ['NY', 'LA', 'NY', 'SF', 'LA', 'NY', 'SF', 'LA', 'NY', 'SF', 'LA'],
    'Purchased': [0, 1, 1, 0, 1, 1, 0, 1, None, 1, 0]
}
df = pd.DataFrame(data)

# Run cleaner
cleaner = SmartDFCleaner(target_column="Purchased", show_difference=True)
cleaned_df = cleaner.fit_transform(df)

print(cleaned_df)

2️⃣ One-Call Full Demo (with auto + predictive cleaning reports)

from nullpy import SmartDFCleaner

newdf = SmartDFCleaner().clean_it(df, target_column="Purchased")

This will:

  • Show original data.
  • Show auto-cleaned DataFrame.
  • Show predictive-cleaned DataFrame.
  • Print summary reports + null counts.

⚙️ Parameters

Parameter Type Default Description
target_column str None Target variable for correlation/predictive imputation
impute_strategy str auto auto, mean, median, mode, predictive
outlier_strategy str auto auto, clip, drop, predictive
verbose bool True Print logs and progress
show_difference bool False Show before/after difference report

📊 Example Console Output

> Identified 2 numerical and 2 categorical features.
> 'Age' has high missing data (18.0%). Using simple imputation.
> Applied Median Imputation to column 'Age'.
> Clipped 1 outliers in column 'Age'.
> Cleaning process completed successfully!

🛠️ Methods

  • fit_transform(df) → Returns cleaned DataFrame.
  • demo_report(df, target_column, ...) → Runs auto + predictive cleaning demo.
  • clean_it(df, target_column, ...) → One-call shortcut for full demo + final cleaned DF.

📌 Roadmap

  • 🔜 Add support for time-series cleaning.
  • 🔜 Add advanced outlier detection (Isolation Forest, Z-score).
  • 🔜 Export cleaning logs to JSON/CSV.

👨‍💻 Author

Made with ❤️ and ☕ by Foresty (India 🇮🇳)


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nullpy-0.0.2.tar.gz (10.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nullpy-0.0.2-py3-none-any.whl (9.3 kB view details)

Uploaded Python 3

File details

Details for the file nullpy-0.0.2.tar.gz.

File metadata

  • Download URL: nullpy-0.0.2.tar.gz
  • Upload date:
  • Size: 10.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.11

File hashes

Hashes for nullpy-0.0.2.tar.gz
Algorithm Hash digest
SHA256 6e66f3e347f2cfb027d4c04d01cb8b912e0ff93e9254c8396a5e3a83ce8061de
MD5 486f60074ec4d356529d31db49ab1133
BLAKE2b-256 0b087b740aee0ab24eba88573107593f39a4d6750a5f44bfe64b65618ca54e23

See more details on using hashes here.

File details

Details for the file nullpy-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: nullpy-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 9.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.11

File hashes

Hashes for nullpy-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 a28f1f87e39db6efdf35f2292aee2fa7454ffd53ddb821c41c0099018bb3b477
MD5 4d8dc5c01a3ff7183ac3856c49c08735
BLAKE2b-256 009a06a8d2a3a72fba946ac7efd10ee876524fa5a888ec707b67ab9ab68099be

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page