Skip to main content

Simple DataFrame cleaning toolkit

Project description

dfcleanerpro

PyPI Python License

A lightweight and efficient DataFrame preprocessing library designed for modern data workflows.


Installation

pip install dfcleanerpro

Key Capabilities

  • Automated data cleaning pipeline
  • Intelligent handling of missing values
  • Standardized column name formatting
  • Duplicate record elimination
  • Removal of low-information columns
  • String normalization for text fields
  • Simple and intuitive API design

Getting Started

Basic Usage

import pandas as pd
from dfcleanerpro import DataCleaner

df = pd.read_csv("data.csv")

cleaned_df = DataCleaner(df).run_all()

Example Transformation

data = {
    "Name ": ["Nishanth", "Arun", "Arun", "Azar", "None"],
    "Age": [27, None, 26, 27, 30],
    "City": [" Bangalore", "Chennai", "Chennai", None, "Mumbai"],
    "Constant": [1,1,1,1,1]
}

df = pd.DataFrame(data)

cleaned = DataCleaner(df).run_all()
print(cleaned)

What It Handles

Task Description
Missing Values Replaces nulls using smart strategies
Column Formatting Converts names to clean snake_case
Duplicate Rows Identifies and removes duplicates
Constant Columns Drops columns with no variance
String Cleanup Removes unwanted whitespace

Design Philosophy

Data preprocessing is a repetitive but critical step in any data workflow. This library focuses on:

  • Simplicity over complexity
  • Clean and readable transformations
  • Reusability across projects

Built With

  • Python
  • Pandas
  • NumPy

Use Cases

  • Data Engineering pipelines
  • Data Science preprocessing
  • Exploratory Data Analysis (EDA)
  • Machine Learning data preparation

Future Enhancements

  • CLI support for CSV processing
  • Data validation rules
  • Outlier detection utilities
  • Data profiling reports
  • Integration with big data tools

Contributions

Contributions, issues, and feature requests are welcome!

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dfcleanerpro-0.2.6.tar.gz (3.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dfcleanerpro-0.2.6-py3-none-any.whl (3.6 kB view details)

Uploaded Python 3

File details

Details for the file dfcleanerpro-0.2.6.tar.gz.

File metadata

  • Download URL: dfcleanerpro-0.2.6.tar.gz
  • Upload date:
  • Size: 3.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for dfcleanerpro-0.2.6.tar.gz
Algorithm Hash digest
SHA256 8526497ae0c6b1c43b94a43656b53b256168f6e05a59af775310f7a05feb79e7
MD5 6306ad633ca725f295bb826118c5d2cf
BLAKE2b-256 f568bae7c6980a1c6336a30d8de1c561f09b0053a924867e677b488fa6f769b1

See more details on using hashes here.

File details

Details for the file dfcleanerpro-0.2.6-py3-none-any.whl.

File metadata

  • Download URL: dfcleanerpro-0.2.6-py3-none-any.whl
  • Upload date:
  • Size: 3.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for dfcleanerpro-0.2.6-py3-none-any.whl
Algorithm Hash digest
SHA256 6d92699470914b2532bbb2305035a2bbc595a2689b8d91c7d56639118c121969
MD5 4bfadd7aa66827b887f50e6b73a3d6af
BLAKE2b-256 4a71785718f75ac519c24478a09f0ab45bb9b0c2374aa6fb5e114e67282aef7e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page