Skip to main content

Simple DataFrame cleaning toolkit

Project description

dfcleanerpro

PyPI Python License

Professional DataFrame cleaning toolkit for Data Engineers and Data Scientists.


Install

pip install dfcleanerpro

Features

  • One-line DataFrame cleaning
  • Fluent pipeline API with method chaining
  • Audit log — track every transformation
  • Schema validator — enforce column types
  • Numeric-safe missing value filling
  • Dataset quality analyzer

Quick Start

Simple Clean

from dfcleanerpro import clean_dataframe

cleaned = clean_dataframe(
    df,
    drop_duplicates=True,
    snake_case=True,
    fill_missing="mean",
    remove_empty_cols=True
)

Pipeline API

from dfcleanerpro import DFPipeline

result, audit = (
    DFPipeline(df)
    .snake_case()
    .remove_duplicates()
    .drop_empty_cols()
    .fill_missing("mean")
    .run(audit=True)
)

print(audit)
# {
#   'steps_applied': ['snake_case', 'remove_duplicates', 'drop_empty_cols', 'fill_missing_mean'],
#   'duplicates_removed': 3,
#   'cols_dropped': ['empty_col'],
#   'cols_filled': ['age', 'salary'],
#   'rows_before': 100,
#   'rows_after': 97
# }

Schema Validator

result = (
    DFPipeline(df)
    .validate_schema({"age": "int", "salary": "float"})
    .clean()
    .run()
)

Dataset Analyzer

from dfcleanerpro import analyze_dataframe

report = analyze_dataframe(df)
# {
#   'rows': 500,
#   'columns': 8,
#   'duplicate_rows': 12,
#   'missing_values': {'age': 3, 'salary': 7},
#   'dtypes': {'age': 'int64', 'name': 'object'}
# }

API Reference

clean_dataframe(df, ...)

Parameter Type Default Description
drop_duplicates bool True Remove duplicate rows
snake_case bool True Convert column names to snake_case
fill_missing str/None None 'zero', 'mean', 'median', or None
remove_empty_cols bool True Drop all-null columns

DFPipeline(df)

Method Description
.snake_case() Convert column names to snake_case
.remove_duplicates() Drop duplicate rows
.drop_empty_cols() Drop all-null columns
.fill_missing(method) Fill numeric NaNs — zero/mean/median
.validate_schema(dict) Enforce column types
.run(audit=False) Execute pipeline, optionally with audit

Roadmap

  • auto_dtype_conversion()
  • trim_string_columns()
  • detect_outliers()
  • data_quality_report() — HTML export
  • CLI support: dfcleanerpro clean file.csv

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dfcleanerpro-0.2.2.tar.gz (7.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dfcleanerpro-0.2.2-py3-none-any.whl (7.4 kB view details)

Uploaded Python 3

File details

Details for the file dfcleanerpro-0.2.2.tar.gz.

File metadata

  • Download URL: dfcleanerpro-0.2.2.tar.gz
  • Upload date:
  • Size: 7.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for dfcleanerpro-0.2.2.tar.gz
Algorithm Hash digest
SHA256 4b28d61a757742003fe338f58fe357a62e551252e0a1420bdbf672ce21895c99
MD5 4254f0c4e3071968f1026c325e270018
BLAKE2b-256 fb3a9d51ddb851ed2b2b82efb937a5ab2c11009779c87c91d2ff92d50b462fe0

See more details on using hashes here.

File details

Details for the file dfcleanerpro-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: dfcleanerpro-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 7.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for dfcleanerpro-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 9890354beec6a228a65374764bd98ee548735031e523751a00d34375198bbdf7
MD5 44c20b6298bb9cdc0e8956bed78cf783
BLAKE2b-256 22668fab0776844de6c34d2b9067c3e930dc12380930da19026c886d2a109f72

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page