Skip to main content

A package for preprocessing text

Project description

Usage of PreProcessingService

You can access following functions:

The provided package is a collection of functions for text preprocessing and cleaning. Here is a brief explanation of each function :

FOR SUPPLIER:

supplier_name_cleaning

supplier_norm_name_cleaning

supplier_name_normalization_bkp

FOR KEYWORD:

removestopwords

lemmatization

removespecialchars

RemoveNonEngWords

SeparateNumStrings

stemming

FOR ITEM:

_en_stopwords: Returns a list of English stopwords.

strip_numeric: Removes numbers from a string.

strip_punctuation: Removes all punctuation marks from a string.

remove_stopwords: Removes stopwords from a string.

strip_multiple_whitespaces: Removes excess whitespace from a string.

strip_short: Removes words with a length less than a specified size from a string.

remove_alpha_num: Removes alphanumeric characters from a string.

lemmatize: Lemmatizes a list of tokens.

nltk_pos: Performs Part-of-Speech (POS) tagging using NLTK.

stanford_pos: Performs POS tagging using Stanford POS Tagger.

filter_pos: Filters words based on specific POS tags.

clean_text: Cleans the text by applying various cleaning functions.

preprocess_string: Preprocesses a string by applying a list of cleaning functions.

preprocess_postags: Applies a list of filtering functions to a list of strings.

guided_buy_preprocess: Preprocesses a list of strings for guided buying purposes.

dump_interim_steps: Creates a DataFrame with the intermediate steps of preprocessing.

lemmatization: Performs lemmatization on a string.

cold_start_preprocess: Preprocesses a list of words for cold start purposes.

unique_value_col: Splits words based on '|'.

convert_synonyms:A collection of synonym mappings

clean_description: Includes all above cleaning steps

Example usage:

pip install preProcessingGEP==1.0.4

import preProcessingGEP

print(preProcessingGEP.PreProcessingService.clean_description(inputText))

General Syntax to use any function

preProcessingGEP.itemPreprocess.function_name()

preProcessingGEP.keywordPreprocess.function_name()

preProcessingGEP.function_name()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

newpreProcessingGEP-1.0.7.tar.gz (2.3 kB view hashes)

Uploaded Source

Built Distribution

newpreProcessingGEP-1.0.7-py3-none-any.whl (2.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page