Simple DataFrame cleaning toolkit
Project description
dfcleanerpro
Professional DataFrame cleaning toolkit for Data Engineers and Data Scientists.
Install
pip install dfcleanerpro
Features
- One-line DataFrame cleaning
- Fluent pipeline API with method chaining
- Audit log — track every transformation
- Schema validator — enforce column types
- Numeric-safe missing value filling
- Dataset quality analyzer
Quick Start
Simple Clean
from dfcleanerpro import clean_dataframe
cleaned = clean_dataframe(
df,
drop_duplicates=True,
snake_case=True,
fill_missing="mean",
remove_empty_cols=True
)
Pipeline API
from dfcleanerpro import DFPipeline
result, audit = (
DFPipeline(df)
.snake_case()
.remove_duplicates()
.drop_empty_cols()
.fill_missing("mean")
.run(audit=True)
)
print(audit)
# {
# 'steps_applied': ['snake_case', 'remove_duplicates', 'drop_empty_cols', 'fill_missing_mean'],
# 'duplicates_removed': 3,
# 'cols_dropped': ['empty_col'],
# 'cols_filled': ['age', 'salary'],
# 'rows_before': 100,
# 'rows_after': 97
# }
Schema Validator
result = (
DFPipeline(df)
.validate_schema({"age": "int", "salary": "float"})
.clean()
.run()
)
Dataset Analyzer
from dfcleanerpro import analyze_dataframe
report = analyze_dataframe(df)
# {
# 'rows': 500,
# 'columns': 8,
# 'duplicate_rows': 12,
# 'missing_values': {'age': 3, 'salary': 7},
# 'dtypes': {'age': 'int64', 'name': 'object'}
# }
API Reference
clean_dataframe(df, ...)
| Parameter | Type | Default | Description |
|---|---|---|---|
drop_duplicates |
bool | True | Remove duplicate rows |
snake_case |
bool | True | Convert column names to snake_case |
fill_missing |
str/None | None | 'zero', 'mean', 'median', or None |
remove_empty_cols |
bool | True | Drop all-null columns |
DFPipeline(df)
| Method | Description |
|---|---|
.snake_case() |
Convert column names to snake_case |
.remove_duplicates() |
Drop duplicate rows |
.drop_empty_cols() |
Drop all-null columns |
.fill_missing(method) |
Fill numeric NaNs — zero/mean/median |
.validate_schema(dict) |
Enforce column types |
.run(audit=False) |
Execute pipeline, optionally with audit |
Roadmap
-
auto_dtype_conversion() -
trim_string_columns() -
detect_outliers() -
data_quality_report()— HTML export - CLI support:
dfcleanerpro clean file.csv
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
dfcleanerpro-0.2.2.tar.gz
(7.2 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dfcleanerpro-0.2.2.tar.gz.
File metadata
- Download URL: dfcleanerpro-0.2.2.tar.gz
- Upload date:
- Size: 7.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4b28d61a757742003fe338f58fe357a62e551252e0a1420bdbf672ce21895c99
|
|
| MD5 |
4254f0c4e3071968f1026c325e270018
|
|
| BLAKE2b-256 |
fb3a9d51ddb851ed2b2b82efb937a5ab2c11009779c87c91d2ff92d50b462fe0
|
File details
Details for the file dfcleanerpro-0.2.2-py3-none-any.whl.
File metadata
- Download URL: dfcleanerpro-0.2.2-py3-none-any.whl
- Upload date:
- Size: 7.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9890354beec6a228a65374764bd98ee548735031e523751a00d34375198bbdf7
|
|
| MD5 |
44c20b6298bb9cdc0e8956bed78cf783
|
|
| BLAKE2b-256 |
22668fab0776844de6c34d2b9067c3e930dc12380930da19026c886d2a109f72
|