Automating DF and CSV Data Cleaning
Project description
dfcleaner
Documentation
dfcleaner is a lightweight Python utility for cleaning, parsing, and preparing time series and tabular datasets. It streamlines common DataFrame operations such as timezone normalization, date parsing, frequency inference, BOM removal, and value cleaning.
Installation
pip install dfcleaner
Or clone locally for development:
git clone https://github.com/BrandynHamilton/dfcleaner
cd dfcleaner
pip install -e .
Usage Example
from dfcleaner import DFCleaner
cleaner = DFCleaner(timezone="UTC")
df = cleaner.to_df("my_data.csv")
df, freq = cleaner.to_time(df)
df = cleaner.cleaning_values(df)
df = cleaner.clean_dates(df, time_freq=freq)
Core Methods
__init(timezone=None)
Accepts a string like 'UTC', 'US/Eastern', or any other valid IANA timezone string. If None, it removes timezone awareness from datetime index.
to_df(file, delimiter=',')
Load a CSV or Excel file into a clean pandas DataFrame.
- Handles BOM characters and whitespace.
- Removes rows that contain only invisible characters or whitespace.
apply_timezone(df)
Applies or removes timezone from the DataFrame index depending on the initialized setting.
detect_time_col(df, custom_col=None)
Scans DataFrame for common time-related column names. You can optionally pass a custom override.
to_time(df, time_col=None, dayfirst=False)
- Converts a detected or specified datetime column to the index.
- Infers the frequency of the datetime index.
- Returns the DataFrame with datetime index as well as the estimated frequency ('D','M','Q')
clean_dates(df, time_freq)
Drops incomplete periods based on inferred time frequency:
W
: drops current week if incompleteM
: drops current monthQ
: drops current quarter
cleaning_values(df)
Cleans numeric object columns by:
- Removing symbols like
%
,$
,,
- Replacing Excel artifacts like
#DIV/0!
withNaN
- Converting to proper numeric dtype
open_json(file_name)
Loads a JSON file and parses into a Python dictionary.
Project Structure
dfcleaner/
├── __init__.py
└── core.py
License
MIT License
Questions or Issues?
If you encounter any problems, have feature requests, or want to contribute improvements, feel free to reach out.
Email: [brandynham1120@gmail.com]
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file dfcleaner-0.1.3.tar.gz
.
File metadata
- Download URL: dfcleaner-0.1.3.tar.gz
- Upload date:
- Size: 5.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
b0124c5bd44fc66e639146c9d87779e66a320af606534e71a2dce9751683a0a7
|
|
MD5 |
bddc9b3a46b4a12286b6e4070e463fe4
|
|
BLAKE2b-256 |
6a8aa17d3c125949a52f3a52c6162c4590e9ffb9159847e3a264008802ad43fd
|
File details
Details for the file dfcleaner-0.1.3-py3-none-any.whl
.
File metadata
- Download URL: dfcleaner-0.1.3-py3-none-any.whl
- Upload date:
- Size: 5.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
7d1985b868e817d557d316f0575979f305ff4c7a2a084002c112bce01d71d45c
|
|
MD5 |
238842d5f1797cefdab3e59d487f89a0
|
|
BLAKE2b-256 |
b99791c78677cb3512863b1bfa50adb85e3db47b379176c0425a55663ba4b1f8
|