A package for preprocessing text
Project description
Usage of PreProcessingService
You can access following functions:
The provided package is a collection of functions for text preprocessing and cleaning. Here is a brief explanation of each function :
FOR SUPPLIER:
supplier_name_cleaning
supplier_norm_name_cleaning
supplier_name_normalization_bkp
FOR KEYWORD:
removestopwords
lemmatization
removespecialchars
RemoveNonEngWords
SeparateNumStrings
stemming
FOR ITEM:
_en_stopwords: Returns a list of English stopwords.
strip_numeric: Removes numbers from a string.
strip_punctuation: Removes all punctuation marks from a string.
remove_stopwords: Removes stopwords from a string.
strip_multiple_whitespaces: Removes excess whitespace from a string.
strip_short: Removes words with a length less than a specified size from a string.
remove_alpha_num: Removes alphanumeric characters from a string.
lemmatize: Lemmatizes a list of tokens.
nltk_pos: Performs Part-of-Speech (POS) tagging using NLTK.
stanford_pos: Performs POS tagging using Stanford POS Tagger.
filter_pos: Filters words based on specific POS tags.
clean_text: Cleans the text by applying various cleaning functions.
preprocess_string: Preprocesses a string by applying a list of cleaning functions.
preprocess_postags: Applies a list of filtering functions to a list of strings.
guided_buy_preprocess: Preprocesses a list of strings for guided buying purposes.
dump_interim_steps: Creates a DataFrame with the intermediate steps of preprocessing.
lemmatization: Performs lemmatization on a string.
cold_start_preprocess: Preprocesses a list of words for cold start purposes.
unique_value_col: Splits words based on '|'.
convert_synonyms:A collection of synonym mappings
clean_description: Includes all above cleaning steps
Example usage:
pip install preProcessingGEP==1.0.4
import preProcessingGEP
print(preProcessingGEP.PreProcessingService.clean_description(inputText))
General Syntax to use any function
preProcessingGEP.itemPreprocess.function_name()
preProcessingGEP.keywordPreprocess.function_name()
preProcessingGEP.function_name()
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for newpreProcessingGEP-1.0.6.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 347a8630da1ba113f8210c58d75e11e0c878624e273ff65b3f05343f3d62469c |
|
MD5 | c88049906eb994e57bfcaac1364fd919 |
|
BLAKE2b-256 | ab148f5eff3b517301e095a2bfd79c80293f921ca099f45a9524f220175f6990 |
Hashes for newpreProcessingGEP-1.0.6-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 61fbe9d71afdb868b8e117e869efeda4832d24a744072e1853525467e282ed5b |
|
MD5 | 7b53d232552763302bc198fc7aff469e |
|
BLAKE2b-256 | 7c8ccb9c4952156203629392cdda5f7fd305f432d2297d7b1cb56b22aa2818a5 |