text-preprocessing

A python package for text preprocessing task in natural language processing

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 2 - Pre-Alpha
Intended Audience
- Developers
License
- OSI Approved :: BSD License
Natural Language
- English
Programming Language
- Python :: 3.7
- Python :: 3.8

Project description

A python package for text preprocessing task in natural language processing.

Usage

To use this text preprocessing package, first install it using pip:

pip install text-preprocessing

Then, import the package in your python script and call appropriate functions:

from text_preprocessing import preprocess_text
from text_preprocessing import to_lower, remove_email, remove_url, remove_punctuation, lemmatize_word

# Preprocess text using default preprocess functions in the pipeline
text_to_process = 'Helllo, I am John Doe!!! My email is john.doe@email.com. Visit our website www.johndoe.com'
preprocessed_text = preprocess_text(text_to_process)
print(preprocessed_text)
# output: hello email visit website

# Preprocess text using custom preprocess functions in the pipeline
preprocess_functions = [to_lower, remove_email, remove_url, remove_punctuation, lemmatize_word]
preprocessed_text = preprocess_text(text_to_process, preprocess_functions)
print(preprocessed_text)
# output: helllo i am john doe my email is visit our website

Features

Feature	Function
convert to lower case	to_lower
convert to upper case	to_upper
keep only alphabetic and numerical characters	keep_alpha_numeric
check and correct spellings	check_spelling
expand contractions	expand_contraction
remove URLs	remove_url
remove names	remove_name
remove emails	remove_email
remove phone numbers	remove_phone_number
remove SSNs	remove_ssn
remove credit card numbers	remove_credit_card_number
remove numbers	remove_number
remove bullets and numbering	remove_itemized_bullet_and_numbering
remove special characters	remove_special_character
remove punctuations	remove_punctuation
remove extra whitespace	remove_whitespace
normalize unicode (e.g., Café -> Cafe)	normalize_unicode
remove stop words	remove_stopword
tokenize words	tokenize_word
tokenize sentences	tokenize_sentence
substitute custom words (e.g., msft -> Microsoft)	substitute_token
stem words	stem_word
lemmatize words	lemmatize_word
preprocess text through a sequence of preprocessing functions	preprocess_text

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 2 - Pre-Alpha
Intended Audience
- Developers
License
- OSI Approved :: BSD License
Natural Language
- English
Programming Language
- Python :: 3.7
- Python :: 3.8

Release history Release notifications | RSS feed

0.1.1

Sep 27, 2022

0.1.0

Jan 1, 2021

0.0.9

Sep 8, 2020

0.0.8

May 17, 2020

This version

0.0.6

May 14, 2020

0.0.4

May 13, 2020

0.0.3

May 12, 2020

0.0.2

May 12, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

text_preprocessing-0.0.6.tar.gz (12.7 kB view hashes)

Uploaded May 14, 2020 Source

Built Distribution

text_preprocessing-0.0.6-py2.py3-none-any.whl (9.6 kB view hashes)

Uploaded May 14, 2020 Python 2 Python 3

Hashes for text_preprocessing-0.0.6.tar.gz

Hashes for text_preprocessing-0.0.6.tar.gz
Algorithm	Hash digest
SHA256	`a21850bc42e48033bba93ba23e7cc7edd54fbd2413844a9743725c7651063dbb`
MD5	`ad51b48dbad68e1b1e7a0f708f9031d9`
BLAKE2b-256	`6e0db714b3a26d4136e652a7b46e37d9993c31f4321cdb5676702c58269a3b49`

Hashes for text_preprocessing-0.0.6-py2.py3-none-any.whl

Hashes for text_preprocessing-0.0.6-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`9745ec19faf948eb52c0e9c7b070a75648b18d04ea81cff0a8744973dd1995a5`
MD5	`676207e92e56f8364278ec2b2d76e18c`
BLAKE2b-256	`f596eb93661641ccfc517057d9ab392c513bec44dd31398969940774f273301e`