Skip to main content

A python package for text preprocessing task in natural language processing

Project description

A python package for text preprocessing task in natural language processing.

Usage

To use this text preprocessing package,

from text_preprocessing import preprocess_text

# Preprocess text using default preprocess functions in the pipeline
text_to_process = 'Helllo, I am John Doe!!! My email is john.doe@email.com. Visit our website www.johndoe.com'
preprocessed_text = preprocess_text(text_to_process)
print(preprocessed_text)

# Preprocess text using custom preprocess functions in the pipeline
preprocess_functions = [to_lower, remove_email, remove_url, remove_punctuations, lemmatize_word]
preprocessed_text = preprocess_text(text_to_process, preprocess_functions)
print(preprocessed_text)

Features

  • convert to lower case

  • convert to upper case

  • keep only alphabetic and numerical characters

  • check and correct spellings

  • expand contractions

  • remove URL

  • remove name

  • remove email

  • remove phone number

  • remove SSN

  • remove credit card number

  • remove numbers

  • remove special characters

  • remove punctuations

  • remove extra whitespace

  • normalize unicode (e.g., Café -> Cafe)

  • remove stop words

  • substitute custom word (e.g., msft -> Microsoft)

  • stem words

  • lemmatize words

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

text_preprocessing-0.0.3.tar.gz (11.4 kB view hashes)

Uploaded Source

Built Distribution

text_preprocessing-0.0.3-py2.py3-none-any.whl (9.2 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page