A package for preprocessing text
Project description
Usage of PreProcessingService
You can access following functions:
The provided package is a collection of functions for text preprocessing and cleaning. Here is a brief explanation of each function :
FOR SUPPLIER:
supplier_name_cleaning
supplier_norm_name_cleaning
supplier_name_normalization_bkp
FOR KEYWORD:
removestopwords
lemmatization
removespecialchars
RemoveNonEngWords
SeparateNumStrings
stemming
FOR ITEM:
_en_stopwords: Returns a list of English stopwords.
strip_numeric: Removes numbers from a string.
strip_punctuation: Removes all punctuation marks from a string.
remove_stopwords: Removes stopwords from a string.
strip_multiple_whitespaces: Removes excess whitespace from a string.
strip_short: Removes words with a length less than a specified size from a string.
remove_alpha_num: Removes alphanumeric characters from a string.
lemmatize: Lemmatizes a list of tokens.
nltk_pos: Performs Part-of-Speech (POS) tagging using NLTK.
stanford_pos: Performs POS tagging using Stanford POS Tagger.
filter_pos: Filters words based on specific POS tags.
clean_text: Cleans the text by applying various cleaning functions.
preprocess_string: Preprocesses a string by applying a list of cleaning functions.
preprocess_postags: Applies a list of filtering functions to a list of strings.
guided_buy_preprocess: Preprocesses a list of strings for guided buying purposes.
dump_interim_steps: Creates a DataFrame with the intermediate steps of preprocessing.
lemmatization: Performs lemmatization on a string.
cold_start_preprocess: Preprocesses a list of words for cold start purposes.
unique_value_col: Splits words based on '|'.
convert_synonyms:A collection of synonym mappings
clean_description: Includes all above cleaning steps
Example usage:
pip install preProcessingGEP==1.0.4
import preProcessingGEP
print(preProcessingGEP.PreProcessingService.clean_description(inputText))
General Syntax to use any function
preProcessingGEP.itemPreprocess.function_name()
preProcessingGEP.keywordPreprocess.function_name()
preProcessingGEP.function_name()
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file newpreProcessingGEP-1.0.7.tar.gz
.
File metadata
- Download URL: newpreProcessingGEP-1.0.7.tar.gz
- Upload date:
- Size: 2.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
7c1c635d1fa9682ee49321c9c20d4cc4d5d32f492b562ffea6212be2a1d141be
|
|
MD5 |
452ee5b36f32ce6e752f173de0a94932
|
|
BLAKE2b-256 |
df0279a227bb1e0d1f320c68065f69e557206bb10035fc0cd2ddc9c9d5bc5dbc
|
File details
Details for the file newpreProcessingGEP-1.0.7-py3-none-any.whl
.
File metadata
- Download URL: newpreProcessingGEP-1.0.7-py3-none-any.whl
- Upload date:
- Size: 2.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
6547423f4f6abfbefb72a3f23822d2a6b5bd6cc272a65a298d25884273917769
|
|
MD5 |
82d43738973d960461e3fa197c0fc14b
|
|
BLAKE2b-256 |
6f5c567124cc7045dcf12937f76d08efbcd64e7bdfd1a82dab58676c5eb2a3ee
|