Skip to main content

A python package for text preprocessing task in natural language processing

Project description

A python package for text preprocessing task in natural language processing.

Usage

To use this text preprocessing package, first install it using pip:

pip install text-preprocessing

Then, import the package in your python script and call appropriate functions:

from text_preprocessing import preprocess_text
from text_preprocessing import to_lower, remove_email, remove_url, remove_punctuation, lemmatize_word

# Preprocess text using default preprocess functions in the pipeline
text_to_process = 'Helllo, I am John Doe!!! My email is john.doe@email.com. Visit our website www.johndoe.com'
preprocessed_text = preprocess_text(text_to_process)
print(preprocessed_text)
# output: hello email visit website

# Preprocess text using custom preprocess functions in the pipeline
preprocess_functions = [to_lower, remove_email, remove_url, remove_punctuation, lemmatize_word]
preprocessed_text = preprocess_text(text_to_process, preprocess_functions)
print(preprocessed_text)
# output: helllo i am john doe my email is visit our website

Features

Feature

Function

convert to lower case

to_lower

convert to upper case

to_upper

keep only alphabetic and numerical characters

keep_alpha_numeric

check and correct spellings

check_spelling

expand contractions

expand_contraction

remove URLs

remove_url

remove names

remove_name

remove emails

remove_email

remove phone numbers

remove_phone_number

remove SSNs

remove_ssn

remove credit card numbers

remove_credit_card_number

remove numbers

remove_number

remove bullets and numbering

remove_itemized_bullet_and_numbering

remove special characters

remove_special_character

remove punctuations

remove_punctuation

remove extra whitespace

remove_whitespace

normalize unicode (e.g., café -> cafe)

normalize_unicode

remove stop words

remove_stopword

tokenize words

tokenize_word

tokenize sentences

tokenize_sentence

substitute custom words (e.g., vs -> versus)

substitute_token

stem words

stem_word

lemmatize words

lemmatize_word

preprocess text through a sequence of preprocessing functions

preprocess_text

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

text_preprocessing-0.1.1.tar.gz (13.9 kB view details)

Uploaded Source

Built Distribution

text_preprocessing-0.1.1-py2.py3-none-any.whl (9.6 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file text_preprocessing-0.1.1.tar.gz.

File metadata

  • Download URL: text_preprocessing-0.1.1.tar.gz
  • Upload date:
  • Size: 13.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.7

File hashes

Hashes for text_preprocessing-0.1.1.tar.gz
Algorithm Hash digest
SHA256 81953b85c3ac6343a2013a9296191cadef8f14275867c7982a175f0d6d941b28
MD5 39fb88c4c0f245a9da5d60f93fd22624
BLAKE2b-256 71db92c1ce26b943e220819b094d8eade7122cefdc283e6fdf6699d798e9bf95

See more details on using hashes here.

File details

Details for the file text_preprocessing-0.1.1-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for text_preprocessing-0.1.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 848729c6f0dc8fd8d822e21744bc6eaf21ad01709bf438de62d37db4dbe2e61e
MD5 67461f96ba9b1e588c13c55e1da318df
BLAKE2b-256 8e15e2afbb516be264e15b2c82f2bb1dfbf902fd78e38e8b271fc7f3efe6df90

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page