A Python package for cleaning and preprocessing data in pandas DataFrames
Project description
DataScrub
DataScrub is a Python package that provides powerful data cleaning and preprocessing capabilities for pandas DataFrames. It offers a collection of functions and utilities to facilitate data cleaning tasks, handling missing values, standardizing data formats, and more. With DataScrub, you can streamline your data preparation process and ensure the quality and consistency of your datasets.
Features
- Clean text data in pandas DataFrames
- Handle missing values with customizable actions
- Perform scaling normalization on numerical columns
- Split and expand data in specified columns
- Convert columns to datetime format and format them as 'YYYY-MM-DD'
- Translate text columns to English using Google Translate
Installation
DataScrub can be easily installed using pip. Simply run the following command:
pip install datascrub
Make sure you have Python 3.7 or above installed on your system.
Usage
To use DataScrub in your Python projects, import the package and create an instance of the DataClean
class:
from datascrub import DataClean
import pandas as pd
# Create a DataFrame
data = pd.read_csv("data.csv")
# Create an instance of DataClean
cleaner = DataClean(data)
# Call the available methods to clean and preprocess your data
cleaned_data = cleaner.prep(clean='all', missing_values={}, perform_scaling_normalization_bool=False,
explode={}, parse_date={}, translate_column_names={})
The DataClean
class takes a pandas DataFrame or a file path as input. You can then use the various methods available in the class to clean and preprocess your data.
Refer to the documentation for detailed information on available methods and usage examples.
Contributing
Contributions to DataScrub are welcome! If you encounter any bugs, have suggestions for improvements, or would like to add new features, please open an issue or submit a pull request on the GitHub repository.
License
This project is licensed under the MIT License. See the LICENSE file for more information.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file datascrub-1.1.4.tar.gz
.
File metadata
- Download URL: datascrub-1.1.4.tar.gz
- Upload date:
- Size: 5.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 327cfd99daaf18a787103cfb54ce9637b2d92257319606b31a2ae744aaf973c2 |
|
MD5 | 77ef2c7d3249fc4d942e2d2c5b53baf2 |
|
BLAKE2b-256 | d26715ebf60342db0e1149433aa0b79826e0cb36608a7ec3b016b0ab3455f0d9 |
File details
Details for the file datascrub-1.1.4-py3-none-any.whl
.
File metadata
- Download URL: datascrub-1.1.4-py3-none-any.whl
- Upload date:
- Size: 8.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d0af72c916a6cb07459bb331bc3a5d994a2a50130efa960ece7b947a0f98a36b |
|
MD5 | 841ac25ac9a5cbd3bf74c9f1bde57cb1 |
|
BLAKE2b-256 | 38df0baa340cc5e40f4662208c8000772b839e43b7b73c331aab88a166120489 |