Skip to main content

Data-Sanitizer is a comprehensive Python package designed to streamline the process of cleaning and preprocessing tabular data using pandas. Whether you are dealing with missing values, duplicates, outliers, or need to encode categorical variables and scale numerical features, DataSanitizer provides a suite of easy-to-use tools to prepare your data for analysis and machine learning.

Project description

data-sanitizer data-sanitizer is a comprehensive Python package designed to streamline the process of cleaning and preprocessing tabular data using pandas. Whether you're dealing with missing values, duplicates, outliers, or need to encode categorical variables and scale numerical features, DataSanitizer provides a suite of easy-to-use tools to prepare your data for analysis and machine learning.

Features

  1. Handle Missing Values: Easily fill missing values with mean, median, mode, or drop them entirely.
  2. Remove Duplicates: Effortlessly identify and remove duplicate rows from your dataset.
  3. Remove Outliers: Detect and remove outliers using Interquartile Range (IQR) or Z-score methods.
  4. Convert Data Types: Seamlessly convert data types of specified columns to ensure consistency.
  5. Encode Categorical Variables: Perform one-hot encoding on categorical features to prepare them for machine learning models.
  6. Scale Numerical Features: Standardize or normalize numerical features to improve the performance of your algorithms.

Installation Install DataSanitizer easily using pip: pip install data-sanitizer

Usage

data-sanitizer integrates smoothly with pandas DataFrames, making it intuitive for users familiar with pandas.

Example

import pandas as pd from data-sanitizer import handle_missing_values, remove_duplicates, remove_outliers, convert_types, encode_categorical, scale_features

Sample DataFrame

df = pd.DataFrame({ 'A': [1, 2, None, 4], 'B': [None, 2, 3, 4], 'C': ['cat', 'dog', 'cat', 'mouse'], 'D': [10, 20, 30, 1000] })

Handling missing values

df = handle_missing_values(df, strategy='mean')

Removing duplicates

df = remove_duplicates(df)

Removing outliers

df = remove_outliers(df, columns=['D'], method='IQR')

Converting data types

df = convert_types(df, columns=['A'], dtypes=[float])

Encoding categorical variables

df = encode_categorical(df, columns=['C'])

Scaling numerical features

df = scale_features(df, columns=['D'], strategy='standard')

print(df)

Contributing We welcome contributions to improve Data-Sanitizer.

Contact For any questions or issues, please contact the package maintainer at goradbj@gmail.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

data_sanitizer-0.3.0.tar.gz (4.1 kB view details)

Uploaded Source

Built Distribution

data_sanitizer-0.3.0-py3-none-any.whl (4.5 kB view details)

Uploaded Python 3

File details

Details for the file data_sanitizer-0.3.0.tar.gz.

File metadata

  • Download URL: data_sanitizer-0.3.0.tar.gz
  • Upload date:
  • Size: 4.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.10.14

File hashes

Hashes for data_sanitizer-0.3.0.tar.gz
Algorithm Hash digest
SHA256 e4062ddacb762b9f16892b910d75f7cc4c3cdd87691f2c7868b9fd8d77347e06
MD5 5c3c5f26a5795f9c8ef32438dce738e0
BLAKE2b-256 30b4f5cca393dc124cbc5ac2638e3d5dcccd2e4d6a8ac403eb41244e91b05c74

See more details on using hashes here.

File details

Details for the file data_sanitizer-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for data_sanitizer-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f8d13c999fecd15fbcecfdbdc4fdb099d3cc132d25ff613126d3950ed925f1d1
MD5 91e75869fca1e252d544565a0c7cd4a5
BLAKE2b-256 97e6f57344995f393288127287b692d39890d166c7e6be338a911d6b895b2966

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page