Skip to main content

A Python package for cleaning and preprocessing data in pandas DataFrames

Project description

DataScrub

DataScrub is a Python package that provides powerful data cleaning and preprocessing capabilities for pandas DataFrames. It offers a collection of functions and utilities to facilitate data cleaning tasks, handling missing values, standardizing data formats, and more. With DataScrub, you can streamline your data preparation process and ensure the quality and consistency of your datasets.

Features

  • Clean text data in pandas DataFrames
  • Handle missing values with customizable actions
  • Perform scaling normalization on numerical columns
  • Split and expand data in specified columns
  • Convert columns to datetime format and format them as 'YYYY-MM-DD'
  • Translate text columns to English using Google Translate

Installation

DataScrub can be easily installed using pip. Simply run the following command:

pip install datascrub

Make sure you have Python 3.7 or above installed on your system.

Usage

To use DataScrub in your Python projects, import the package and create an instance of the DataClean class:

from datascrub import DataClean
import pandas as pd

# Create a DataFrame
data = pd.read_csv("data.csv")

# Create an instance of DataClean
cleaner = DataClean(data)

# Call the available methods to clean and preprocess your data
cleaned_data = cleaner.prep(clean='all', missing_values={}, perform_scaling_normalization_bool=False,
                            explode={}, parse_date={}, translate_column_names={})

The DataClean class takes a pandas DataFrame or a file path as input. You can then use the various methods available in the class to clean and preprocess your data.

Refer to the documentation for detailed information on available methods and usage examples.

Contributing

Contributions to DataScrub are welcome! If you encounter any bugs, have suggestions for improvements, or would like to add new features, please open an issue or submit a pull request on the GitHub repository.

License

This project is licensed under the MIT License. See the LICENSE file for more information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datascrub-1.1.4.tar.gz (5.3 kB view details)

Uploaded Source

Built Distribution

datascrub-1.1.4-py3-none-any.whl (8.0 kB view details)

Uploaded Python 3

File details

Details for the file datascrub-1.1.4.tar.gz.

File metadata

  • Download URL: datascrub-1.1.4.tar.gz
  • Upload date:
  • Size: 5.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for datascrub-1.1.4.tar.gz
Algorithm Hash digest
SHA256 327cfd99daaf18a787103cfb54ce9637b2d92257319606b31a2ae744aaf973c2
MD5 77ef2c7d3249fc4d942e2d2c5b53baf2
BLAKE2b-256 d26715ebf60342db0e1149433aa0b79826e0cb36608a7ec3b016b0ab3455f0d9

See more details on using hashes here.

File details

Details for the file datascrub-1.1.4-py3-none-any.whl.

File metadata

  • Download URL: datascrub-1.1.4-py3-none-any.whl
  • Upload date:
  • Size: 8.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for datascrub-1.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 d0af72c916a6cb07459bb331bc3a5d994a2a50130efa960ece7b947a0f98a36b
MD5 841ac25ac9a5cbd3bf74c9f1bde57cb1
BLAKE2b-256 38df0baa340cc5e40f4662208c8000772b839e43b7b73c331aab88a166120489

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page