Skip to main content

A package for handling various data preprocessing tasks

Project description

Cleaner Panda

Programming For Data Engineering course final project

https://github.com/EmirhanSyl/cleaner-panda/blob/main/logo.jpg

https://pypi.org/project/cleaner-panda/

Installation

pip install cleaner-panda

Modules

Missing Value Handler

  • strategy enum {MEAN, MEDIAN, CONSTANT, REMOVE_ROW, REMOVE_COLUMN, FORWARD_BACKWARD}
  • cont_int = 0, const_str =”none”, const_date=01.01.2024…
  • replace_missing_values(dataFrame, strategy=”strategy.MEAN”, column=0)
  • replace_mean(dataframe, column)
  • replace_median(dataframe, column)
  • replace_constant(dataframe, column, constant)
  • replace_remove_row(dataframe, column)
  • replace_remove_column(dataframe, column)
  • replace_forward_backward(dataframe, column)

Outlier Handler

  • identify_outliers_iqr(data, threshold=1.5)
  • handle_outliers_iqr(data, threshold=1.5, replacement=None) //replacement: Value to replace outliers with (e.g., median, mean) or None to remove outliers

Scaler

  • standardize_data(dataframe)
  • normalize_data(dataframe)
  • robust_scale_data(dataframe)
  • normalize_vectors(dataframe)
  • log_transform_data(dataframe)

Text Cleaner

  • remove_common_words(dataframe, column)
  • convert_to_lowercase(dataframe, column) // Stopwords are words like "the", "is", "and", "in", etc., that occur frequently in a language
  • remove_punctuation(dataframe, column)
  • lemmatization(dataframe, column)
  • expand_contractions(dataframe, column) // (e.g., "can't" to "cannot", "won't" to "will not")
  • remove_special_characters(dataframe, column, remove=[‘.’])
  • remove_numerical(dataframe, column)
  • filter_words(dataframe, column, remove=[“fuck”])

Data Type Converter

Categorical Encoder

  • label_encoding(dataframe, column)
  • one_hot_encoding(dataframe, column)
  • ordinal_encoding(dataframe, column)

Date Time Handler

  • convert_date_to_strings(dataframe column)
  • extract_components(dataframe, column)
  • reformat_date(dataframe, column)
  • calculate_datetime_differences()
  • convert_datetime_to_different_timezones
  • shift_time()
  • handle_irregular_time_intervals()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cleaner_panda-0.1.9.tar.gz (23.3 kB view details)

Uploaded Source

File details

Details for the file cleaner_panda-0.1.9.tar.gz.

File metadata

  • Download URL: cleaner_panda-0.1.9.tar.gz
  • Upload date:
  • Size: 23.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.9.19

File hashes

Hashes for cleaner_panda-0.1.9.tar.gz
Algorithm Hash digest
SHA256 e91c983d28911ce8848a55b83d6da29d0d562112c74de0be84f6e92f351fe831
MD5 9b0653182a824de04f4e55746d4ca184
BLAKE2b-256 e304f8d5055c822c4bc906e7a9128ba74babd471870b14b3bae34730d38d6b7f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page