A package for preprocessing text
Project description
Usage of PreProcessingService
You can access following functions:
The provided package is a collection of functions for text preprocessing and cleaning. Here is a brief explanation of each function :
FOR SUPPLIER:
supplier_name_cleaning
supplier_norm_name_cleaning
supplier_name_normalization_bkp
FOR KEYWORD:
removestopwords
lemmatization
removespecialchars
RemoveNonEngWords
SeparateNumStrings
stemming
FOR ITEM:
_en_stopwords: Returns a list of English stopwords.
strip_numeric: Removes numbers from a string.
strip_punctuation: Removes all punctuation marks from a string.
remove_stopwords: Removes stopwords from a string.
strip_multiple_whitespaces: Removes excess whitespace from a string.
strip_short: Removes words with a length less than a specified size from a string.
remove_alpha_num: Removes alphanumeric characters from a string.
lemmatize: Lemmatizes a list of tokens.
nltk_pos: Performs Part-of-Speech (POS) tagging using NLTK.
stanford_pos: Performs POS tagging using Stanford POS Tagger.
filter_pos: Filters words based on specific POS tags.
clean_text: Cleans the text by applying various cleaning functions.
preprocess_string: Preprocesses a string by applying a list of cleaning functions.
preprocess_postags: Applies a list of filtering functions to a list of strings.
guided_buy_preprocess: Preprocesses a list of strings for guided buying purposes.
dump_interim_steps: Creates a DataFrame with the intermediate steps of preprocessing.
lemmatization: Performs lemmatization on a string.
cold_start_preprocess: Preprocesses a list of words for cold start purposes.
unique_value_col: Splits words based on '|'.
convert_synonyms:A collection of synonym mappings
clean_description: Includes all above cleaning steps
Example usage:
pip install preProcessingGEP==1.0.4
import preProcessingGEP
print(preProcessingGEP.PreProcessingService.clean_description(inputText))
General Syntax to use any function
preProcessingGEP.itemPreprocess.function_name()
preProcessingGEP.keywordPreprocess.function_name()
preProcessingGEP.function_name()
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for newpreProcessingGEP-1.0.7.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7c1c635d1fa9682ee49321c9c20d4cc4d5d32f492b562ffea6212be2a1d141be |
|
MD5 | 452ee5b36f32ce6e752f173de0a94932 |
|
BLAKE2b-256 | df0279a227bb1e0d1f320c68065f69e557206bb10035fc0cd2ddc9c9d5bc5dbc |
Hashes for newpreProcessingGEP-1.0.7-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6547423f4f6abfbefb72a3f23822d2a6b5bd6cc272a65a298d25884273917769 |
|
MD5 | 82d43738973d960461e3fa197c0fc14b |
|
BLAKE2b-256 | 6f5c567124cc7045dcf12937f76d08efbcd64e7bdfd1a82dab58676c5eb2a3ee |