Skip to main content

Automating DF and CSV Data Cleaning

Project description

dfcleaner Documentation

dfcleaner is a lightweight Python utility for cleaning, parsing, and preparing time series and tabular datasets. It streamlines common DataFrame operations such as timezone normalization, date parsing, frequency inference, BOM removal, and value cleaning.

Installation

pip install dfcleaner

Or clone locally for development:

git clone https://github.com/BrandynHamilton/dfcleaner
cd dfcleaner
pip install -e .

Usage Example

from dfcleaner import DFCleaner

cleaner = DFCleaner(timezone="UTC")
df = cleaner.to_df("my_data.csv")
df, freq = cleaner.to_time(df)
df = cleaner.cleaning_values(df)
df = cleaner.clean_dates(df, time_freq=freq)

Core Methods

__init(timezone=None)

Accepts a string like 'UTC', 'US/Eastern', or any other valid IANA timezone string. If None, it removes timezone awareness from datetime index.

to_df(file, delimiter=',')

Load a CSV or Excel file into a clean pandas DataFrame.

  • Handles BOM characters and whitespace.
  • Removes rows that contain only invisible characters or whitespace.

apply_timezone(df)

Applies or removes timezone from the DataFrame index depending on the initialized setting.

detect_time_col(df, custom_col=None)

Scans DataFrame for common time-related column names. You can optionally pass a custom override.

to_time(df, time_col=None, dayfirst=False)

  • Converts a detected or specified datetime column to the index.
  • Infers the frequency of the datetime index.
  • Returns the DataFrame with datetime index as well as the estimated frequency ('D','M','Q')

clean_dates(df, time_freq)

Drops incomplete periods based on inferred time frequency:

  • W: drops current week if incomplete
  • M: drops current month
  • Q: drops current quarter

cleaning_values(df)

Cleans numeric object columns by:

  • Removing symbols like %, $, ,
  • Replacing Excel artifacts like #DIV/0! with NaN
  • Converting to proper numeric dtype

open_json(file_name)

Loads a JSON file and parses into a Python dictionary.

Project Structure

dfcleaner/
├── __init__.py
└── core.py

License

MIT License

Questions or Issues?

If you encounter any problems, have feature requests, or want to contribute improvements, feel free to reach out.

Email: [brandynham1120@gmail.com]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dfcleaner-0.1.3.tar.gz (5.1 kB view details)

Uploaded Source

Built Distribution

dfcleaner-0.1.3-py3-none-any.whl (5.4 kB view details)

Uploaded Python 3

File details

Details for the file dfcleaner-0.1.3.tar.gz.

File metadata

  • Download URL: dfcleaner-0.1.3.tar.gz
  • Upload date:
  • Size: 5.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.4

File hashes

Hashes for dfcleaner-0.1.3.tar.gz
Algorithm Hash digest
SHA256 b0124c5bd44fc66e639146c9d87779e66a320af606534e71a2dce9751683a0a7
MD5 bddc9b3a46b4a12286b6e4070e463fe4
BLAKE2b-256 6a8aa17d3c125949a52f3a52c6162c4590e9ffb9159847e3a264008802ad43fd

See more details on using hashes here.

File details

Details for the file dfcleaner-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: dfcleaner-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 5.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.4

File hashes

Hashes for dfcleaner-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 7d1985b868e817d557d316f0575979f305ff4c7a2a084002c112bce01d71d45c
MD5 238842d5f1797cefdab3e59d487f89a0
BLAKE2b-256 b99791c78677cb3512863b1bfa50adb85e3db47b379176c0425a55663ba4b1f8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page