A python package for text preprocessing task in natural language processing
Project description
A python package for text preprocessing task in natural language processing.
Usage
To use this text preprocessing package,
from text_preprocessing import preprocess_text
# Preprocess text using default preprocess functions in the pipeline
text_to_process = 'Helllo, I am John Doe!!! My email is john.doe@email.com. Visit our website www.johndoe.com'
preprocessed_text = preprocess_text(text_to_process)
print(preprocessed_text)
# Preprocess text using custom preprocess functions in the pipeline
preprocess_functions = [to_lower, remove_email, remove_url, remove_punctuations, lemmatize_word]
preprocessed_text = preprocess_text(text_to_process, preprocess_functions)
print(preprocessed_text)
Features
convert to lower case
convert to upper case
keep only alphabetic and numerical characters
check and correct spellings
expand contractions
remove URL
remove name
remove email
remove phone number
remove SSN
remove credit card number
remove numbers
remove special characters
remove punctuations
remove extra whitespace
normalize unicode (e.g., Café -> Cafe)
remove stop words
substitute custom word (e.g., msft -> Microsoft)
stem words
lemmatize words
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
text_preprocessing-0.0.3.tar.gz
(11.4 kB
view hashes)
Built Distribution
Close
Hashes for text_preprocessing-0.0.3-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6bfc416e1f47928d4a794b0be60778f37ee27f80832462fa9a201d46c7f867d6 |
|
MD5 | bb8063bf51f33087ac868866c720225c |
|
BLAKE2b-256 | 6fcf76a94d2f4a862036d351b5835d0535b37479e62fd096f6e7ca5918f46b91 |