Skip to main content

A package for preprocessing text

Project description

Usage of PreProcessingService

You can access following functions:

The provided package is a collection of functions for text preprocessing and cleaning. Here is a brief explanation of each function :

FOR SUPPLIER:

supplier_name_cleaning

supplier_norm_name_cleaning

supplier_name_normalization_bkp

FOR KEYWORD:

removestopwords

lemmatization

removespecialchars

RemoveNonEngWords

SeparateNumStrings

stemming

FOR ITEM:

_en_stopwords: Returns a list of English stopwords.

strip_numeric: Removes numbers from a string.

strip_punctuation: Removes all punctuation marks from a string.

remove_stopwords: Removes stopwords from a string.

strip_multiple_whitespaces: Removes excess whitespace from a string.

strip_short: Removes words with a length less than a specified size from a string.

remove_alpha_num: Removes alphanumeric characters from a string.

lemmatize: Lemmatizes a list of tokens.

nltk_pos: Performs Part-of-Speech (POS) tagging using NLTK.

stanford_pos: Performs POS tagging using Stanford POS Tagger.

filter_pos: Filters words based on specific POS tags.

clean_text: Cleans the text by applying various cleaning functions.

preprocess_string: Preprocesses a string by applying a list of cleaning functions.

preprocess_postags: Applies a list of filtering functions to a list of strings.

guided_buy_preprocess: Preprocesses a list of strings for guided buying purposes.

dump_interim_steps: Creates a DataFrame with the intermediate steps of preprocessing.

lemmatization: Performs lemmatization on a string.

cold_start_preprocess: Preprocesses a list of words for cold start purposes.

unique_value_col: Splits words based on '|'.

convert_synonyms:A collection of synonym mappings

clean_description: Includes all above cleaning steps

Example usage:

pip install preProcessingGEP==1.0.4

import preProcessingGEP

print(preProcessingGEP.PreProcessingService.clean_description(inputText))

General Syntax to use any function

preProcessingGEP.itemPreprocess.function_name()

preProcessingGEP.keywordPreprocess.function_name()

preProcessingGEP.function_name()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

newpreProcessingGEP-1.0.7.tar.gz (2.3 kB view details)

Uploaded Source

Built Distribution

newpreProcessingGEP-1.0.7-py3-none-any.whl (2.6 kB view details)

Uploaded Python 3

File details

Details for the file newpreProcessingGEP-1.0.7.tar.gz.

File metadata

  • Download URL: newpreProcessingGEP-1.0.7.tar.gz
  • Upload date:
  • Size: 2.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.11

File hashes

Hashes for newpreProcessingGEP-1.0.7.tar.gz
Algorithm Hash digest
SHA256 7c1c635d1fa9682ee49321c9c20d4cc4d5d32f492b562ffea6212be2a1d141be
MD5 452ee5b36f32ce6e752f173de0a94932
BLAKE2b-256 df0279a227bb1e0d1f320c68065f69e557206bb10035fc0cd2ddc9c9d5bc5dbc

See more details on using hashes here.

File details

Details for the file newpreProcessingGEP-1.0.7-py3-none-any.whl.

File metadata

File hashes

Hashes for newpreProcessingGEP-1.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 6547423f4f6abfbefb72a3f23822d2a6b5bd6cc272a65a298d25884273917769
MD5 82d43738973d960461e3fa197c0fc14b
BLAKE2b-256 6f5c567124cc7045dcf12937f76d08efbcd64e7bdfd1a82dab58676c5eb2a3ee

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page