Skip to main content

Simple DataFrame cleaning toolkit

Project description

dfcleanerpro

PyPI Python License

A lightweight and efficient DataFrame preprocessing library designed for modern data workflows.


Installation

pip install dfcleanerpro

Key Capabilities

  • Automated data cleaning pipeline
  • Intelligent handling of missing values
  • Standardized column name formatting
  • Duplicate record elimination
  • Removal of low-information columns
  • String normalization for text fields
  • Simple and intuitive API design

Getting Started

Basic Usage

import pandas as pd
from dfcleanerpro import DataCleaner

df = pd.read_csv("data.csv")

cleaned_df = DataCleaner(df).run_all()

Example Transformation

data = {
    "Name ": ["Alice", "Bob", "Bob", "None"],
    "Age": [25, None, 25, 30],
    "City": [" Chennai", "Delhi ", "Delhi ", None],
    "Constant": [1,1,1,1]
}

df = pd.DataFrame(data)

cleaned = DataCleaner(df).run_all()
print(cleaned)

What It Handles

Task Description
Missing Values Replaces nulls using smart strategies
Column Formatting Converts names to clean snake_case
Duplicate Rows Identifies and removes duplicates
Constant Columns Drops columns with no variance
String Cleanup Removes unwanted whitespace

Design Philosophy

Data preprocessing is a repetitive but critical step in any data workflow. This library focuses on:

  • Simplicity over complexity
  • Clean and readable transformations
  • Reusability across projects

Built With

  • Python
  • Pandas
  • NumPy

Use Cases

  • Data Engineering pipelines
  • Data Science preprocessing
  • Exploratory Data Analysis (EDA)
  • Machine Learning data preparation

Future Enhancements

  • CLI support for CSV processing
  • Data validation rules
  • Outlier detection utilities
  • Data profiling reports
  • Integration with big data tools

Contributions

Contributions, issues, and feature requests are welcome!

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dfcleanerpro-0.2.4.tar.gz (3.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dfcleanerpro-0.2.4-py3-none-any.whl (3.6 kB view details)

Uploaded Python 3

File details

Details for the file dfcleanerpro-0.2.4.tar.gz.

File metadata

  • Download URL: dfcleanerpro-0.2.4.tar.gz
  • Upload date:
  • Size: 3.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for dfcleanerpro-0.2.4.tar.gz
Algorithm Hash digest
SHA256 791cbca1c30e7c188f4fa496240e07be36b98ce353dacc245a88d8a2dbb036bb
MD5 143d75c1ede7f2ae4aa809e81156b06b
BLAKE2b-256 372cc11075f517545bab67d2fe7c4ecd850fb383475cc534241c3173dda469f1

See more details on using hashes here.

File details

Details for the file dfcleanerpro-0.2.4-py3-none-any.whl.

File metadata

  • Download URL: dfcleanerpro-0.2.4-py3-none-any.whl
  • Upload date:
  • Size: 3.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for dfcleanerpro-0.2.4-py3-none-any.whl
Algorithm Hash digest
SHA256 a2e675dfe8e706338f09227ee247ffe775d889a2dcc474567e0f4823169a3c4b
MD5 8fe74b823e6ca689f38700871843f140
BLAKE2b-256 fd16fdb9c888625769bd4ef225cf1e5b8fa427206c4ff13b962ed914a9808c77

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page