Skip to main content

Simple DataFrame cleaning toolkit

Project description

dfcleanerpro

PyPI Python License

A lightweight and efficient DataFrame preprocessing library designed for modern data workflows.


Installation

pip install dfcleanerpro

Key Capabilities

  • Automated data cleaning pipeline
  • Intelligent handling of missing values
  • Standardized column name formatting
  • Duplicate record elimination
  • Removal of low-information columns
  • String normalization for text fields
  • Simple and intuitive API design

Getting Started

Basic Usage

import pandas as pd
from dfcleanerpro import DataCleaner

df = pd.read_csv("data.csv")

cleaned_df = DataCleaner(df).run_all()

Example Transformation

data = {
    "Name ": ["Alice", "Bob", "Bob", "None"],
    "Age": [25, None, 25, 30],
    "City": [" Chennai", "Delhi ", "Delhi ", None],
    "Constant": [1,1,1,1]
}

df = pd.DataFrame(data)

cleaned = DataCleaner(df).run_all()
print(cleaned)

What It Handles

Task Description
Missing Values Replaces nulls using smart strategies
Column Formatting Converts names to clean snake_case
Duplicate Rows Identifies and removes duplicates
Constant Columns Drops columns with no variance
String Cleanup Removes unwanted whitespace

Design Philosophy

Data preprocessing is a repetitive but critical step in any data workflow. This library focuses on:

  • Simplicity over complexity
  • Clean and readable transformations
  • Reusability across projects

Built With

  • Python
  • Pandas
  • NumPy

Use Cases

  • Data Engineering pipelines
  • Data Science preprocessing
  • Exploratory Data Analysis (EDA)
  • Machine Learning data preparation

Future Enhancements

  • CLI support for CSV processing
  • Data validation rules
  • Outlier detection utilities
  • Data profiling reports
  • Integration with big data tools

Contributions

Contributions, issues, and feature requests are welcome!

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dfcleanerpro-0.2.3.tar.gz (3.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dfcleanerpro-0.2.3-py3-none-any.whl (3.6 kB view details)

Uploaded Python 3

File details

Details for the file dfcleanerpro-0.2.3.tar.gz.

File metadata

  • Download URL: dfcleanerpro-0.2.3.tar.gz
  • Upload date:
  • Size: 3.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for dfcleanerpro-0.2.3.tar.gz
Algorithm Hash digest
SHA256 54523d96f8121323c1098ba48da157b44fced2eb8631b7d94ce1a970d15d4f86
MD5 746c107571f82890c59bdbcc25ab1db0
BLAKE2b-256 a874495b8cbef9bc1aaafa810b3f7b4053399e949b772894c68bc874d15b1c97

See more details on using hashes here.

File details

Details for the file dfcleanerpro-0.2.3-py3-none-any.whl.

File metadata

  • Download URL: dfcleanerpro-0.2.3-py3-none-any.whl
  • Upload date:
  • Size: 3.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for dfcleanerpro-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 f24eece2c07c8b25924899970945114e284971ffdc12c4177fb4640c06da7090
MD5 50c294728306c7b403ab7b9a930b7392
BLAKE2b-256 9247f74eec5b895c5dd9b6718c3be45566205e1a23948f214b12b33f11f4e2d6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page