Skip to main content

A Python package for cleaning and preprocessing data in pandas DataFrames

Project description

DataScrub

DataScrub is a Python package that provides powerful data cleaning and preprocessing capabilities for pandas DataFrames. It offers a collection of functions and utilities to facilitate data cleaning tasks, handling missing values, standardizing data formats, and more. With DataScrub, you can streamline your data preparation process and ensure the quality and consistency of your datasets.

Features

  • Clean text data in pandas DataFrames
  • Handle missing values with customizable actions
  • Perform scaling normalization on numerical columns
  • Split and expand data in specified columns
  • Convert columns to datetime format and format them as 'YYYY-MM-DD'
  • Translate text columns to English using Google Translate

Installation

DataScrub can be easily installed using pip. Simply run the following command:

pip install datascrub

Make sure you have Python 3.7 or above installed on your system.

Usage

To use DataScrub in your Python projects, import the package and create an instance of the DataClean class:

from datascrub import DataClean
import pandas as pd

# Create a DataFrame
data = pd.read_csv("data.csv")

# Create an instance of DataClean
cleaner = DataClean(data)

# Call the available methods to clean and preprocess your data
cleaned_data = cleaner.prep(clean='all', missing_values={}, perform_scaling_normalization_bool=False,
                            explode={}, parse_date={}, translate_column_names={})

The DataClean class takes a pandas DataFrame or a file path as input. You can then use the various methods available in the class to clean and preprocess your data.

Refer to the documentation for detailed information on available methods and usage examples.

Contributing

Contributions to DataScrub are welcome! If you encounter any bugs, have suggestions for improvements, or would like to add new features, please open an issue or submit a pull request on the GitHub repository.

License

This project is licensed under the MIT License. See the LICENSE file for more information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datascrub-1.1.5.tar.gz (5.3 kB view details)

Uploaded Source

Built Distribution

datascrub-1.1.5-py3-none-any.whl (8.0 kB view details)

Uploaded Python 3

File details

Details for the file datascrub-1.1.5.tar.gz.

File metadata

  • Download URL: datascrub-1.1.5.tar.gz
  • Upload date:
  • Size: 5.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for datascrub-1.1.5.tar.gz
Algorithm Hash digest
SHA256 a559bbd2a5fa205f53eb79fc275af610a0008c87cb8ce60f499bfd10738d65b6
MD5 a209efd442a5416b22b387962c0b22d9
BLAKE2b-256 6a9678e8237d663eb21bd7b513f35cee5311cc1200dac6864181f8fe7b7903b7

See more details on using hashes here.

File details

Details for the file datascrub-1.1.5-py3-none-any.whl.

File metadata

  • Download URL: datascrub-1.1.5-py3-none-any.whl
  • Upload date:
  • Size: 8.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for datascrub-1.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 1514213901e0776743ada66aa59f2173677f9facb123a468ff4ba0f174d21c49
MD5 07527362dca94e4b8e3c10bc83c51138
BLAKE2b-256 ca083987c08e0b06ee5b8dc10f9ba3a2f13738a48463ae0879363f54af9b0adb

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page