Skip to main content

A python package for text preprocessing task in natural language processing

Project description

A python package for text preprocessing task in natural language processing.

Usage

To use this text preprocessing package, first install it using pip:

pip install text-preprocessing

Then, import the package in your python script and call appropriate functions:

from text_preprocessing import preprocess_text
from text_preprocessing import to_lower, remove_email, remove_url, remove_punctuation, lemmatize_word

# Preprocess text using default preprocess functions in the pipeline
text_to_process = 'Helllo, I am John Doe!!! My email is john.doe@email.com. Visit our website www.johndoe.com'
preprocessed_text = preprocess_text(text_to_process)
print(preprocessed_text)
# output: hello email visit website

# Preprocess text using custom preprocess functions in the pipeline
preprocess_functions = [to_lower, remove_email, remove_url, remove_punctuation, lemmatize_word]
preprocessed_text = preprocess_text(text_to_process, preprocess_functions)
print(preprocessed_text)
# output: helllo i am john doe my email is visit our website

Features

Feature

Function

convert to lower case

to_lower

convert to upper case

to_upper

keep only alphabetic and numerical characters

keep_alpha_numeric

check and correct spellings

check_spelling

expand contractions

expand_contraction

remove URLs

remove_url

remove names

remove_name

remove emails

remove_email

remove phone numbers

remove_phone_number

remove SSNs

remove_ssn

remove credit card numbers

remove_credit_card_number

remove numbers

remove_number

remove bullets and numbering

remove_itemized_bullet_and_numbering

remove special characters

remove_special_character

remove punctuations

remove_punctuation

remove extra whitespace

remove_whitespace

normalize unicode (e.g., café -> cafe)

normalize_unicode

remove stop words

remove_stopword

tokenize words

tokenize_word

tokenize sentences

tokenize_sentence

substitute custom words (e.g., vs -> versus)

substitute_token

stem words

stem_word

lemmatize words

lemmatize_word

preprocess text through a sequence of preprocessing functions

preprocess_text

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

text_preprocessing-0.1.1.tar.gz (13.9 kB view hashes)

Uploaded Source

Built Distribution

text_preprocessing-0.1.1-py2.py3-none-any.whl (9.6 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page