Skip to main content

Automated data cleaning tool

Project description

📦 NullPy

An Intelligent, Data-Aware Pandas DataFrame Cleaner 🚀

NullPy is a Python library for automatic, intelligent, and target-aware cleaning of pandas DataFrames. It handles missing values, outliers, and predictive imputations using ML models when needed. No more repetitive manual cleaning — NullPy decides the best strategy automatically.


✨ Features

  • 🧹 Automatic Missing Value Handling

    • Detects missing values.
    • Decides best imputation strategy (mean, median, mode, predictive).
    • Can train ML models (RandomForest, LinearRegression, LogisticRegression) for predictive imputation.
  • 📊 Outlier Detection & Handling

    • Detects outliers using IQR method.
    • Handles them via clip, drop, or predictive imputation.
  • 🎯 Target-Aware Cleaning

    • Uses correlation & chi-squared tests to decide when predictive cleaning is useful.
  • Highly Customizable

    • Parameters for imputation strategy, outlier strategy, verbosity, and difference reporting.
  • 🖥️ Beautiful Console Output (powered by Rich)

    • Colorful progress bars.
    • Summary cleaning reports.
    • Difference reports (before vs after cleaning).
  • 🔮 Demo Reports Included

    • Quick one-call demo for showing cleaning in action.

🚀 Installation

pip install pandas numpy scikit-learn rich

(Or clone the repo and drop nullpy.py into your project.)


⚡ Quick Usage

1️⃣ Basic Cleaning

import pandas as pd
from nullpy import SmartDFCleaner

# Example Data
data = {
    'Age': [25, 30, None, 22, 35, 45, 28, 33, None, 50, 150],
    'Income': [50000, 60000, 58000, None, 72000, 68000, 52000, 61000, 59000, 75000, 80000],
    'Gender': ['Male', 'Female', 'Male', 'Female', 'Male', None, 'Female', 'Male', 'Male', 'Female', 'Male'],
    'City': ['NY', 'LA', 'NY', 'SF', 'LA', 'NY', 'SF', 'LA', 'NY', 'SF', 'LA'],
    'Purchased': [0, 1, 1, 0, 1, 1, 0, 1, None, 1, 0]
}
df = pd.DataFrame(data)

# Run cleaner
cleaner = SmartDFCleaner(target_column="Purchased", show_difference=True)
cleaned_df = cleaner.fit_transform(df)

print(cleaned_df)

2️⃣ One-Call Full Demo (with auto + predictive cleaning reports)

from nullpy import SmartDFCleaner

newdf = SmartDFCleaner().clean_it(df, target_column="Purchased")

This will:

  • Show original data.
  • Show auto-cleaned DataFrame.
  • Show predictive-cleaned DataFrame.
  • Print summary reports + null counts.

⚙️ Parameters

Parameter Type Default Description
target_column str None Target variable for correlation/predictive imputation
impute_strategy str auto auto, mean, median, mode, predictive
outlier_strategy str auto auto, clip, drop, predictive
verbose bool True Print logs and progress
show_difference bool False Show before/after difference report

📊 Example Console Output

> Identified 2 numerical and 2 categorical features.
> 'Age' has high missing data (18.0%). Using simple imputation.
> Applied Median Imputation to column 'Age'.
> Clipped 1 outliers in column 'Age'.
> Cleaning process completed successfully!

🛠️ Methods

  • fit_transform(df) → Returns cleaned DataFrame.
  • demo_report(df, target_column, ...) → Runs auto + predictive cleaning demo.
  • clean_it(df, target_column, ...) → One-call shortcut for full demo + final cleaned DF.

📌 Roadmap

  • 🔜 Add support for time-series cleaning.
  • 🔜 Add advanced outlier detection (Isolation Forest, Z-score).
  • 🔜 Export cleaning logs to JSON/CSV.

👨‍💻 Author

Made with ❤️ and ☕ by Foresty (India 🇮🇳)


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nullpy-0.0.1.tar.gz (10.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nullpy-0.0.1-py3-none-any.whl (9.3 kB view details)

Uploaded Python 3

File details

Details for the file nullpy-0.0.1.tar.gz.

File metadata

  • Download URL: nullpy-0.0.1.tar.gz
  • Upload date:
  • Size: 10.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.11

File hashes

Hashes for nullpy-0.0.1.tar.gz
Algorithm Hash digest
SHA256 3b6bdc6e9715330c97053957e4babce1035fcfb9f6ec9244a15372f21ed2b5d1
MD5 5852b9daeae80ca09f13f2ffccf6dab5
BLAKE2b-256 5372611bf2db926e682f7e5477545f893fd07ae5cee7294af2d03324a4d5ca8d

See more details on using hashes here.

File details

Details for the file nullpy-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: nullpy-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 9.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.11

File hashes

Hashes for nullpy-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9bec2d4cedd0cf5050bf0ba05e688c5d16ba008fa59b2f39d0220679d8f788ba
MD5 919fd6cbb5d48a4194f3f985c9a68715
BLAKE2b-256 540105fd0afc25767f3730100c42c469874c544edd142d51720befa72b958fc9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page