Skip to main content

Neattext - a simple NLP package for cleaning text

Project description

neattext

NeatText a simple NLP package for cleaning textual data and text preprocessing

Problem

  • Cleaning of unstructured text data
  • Reduce noise [special characters,stopwords]
  • Reducing repetition of using the same code for text preprocessing

Solution

  • convert the already known solution for cleaning text into a reuseable package

Installation

pip install neattext

Usage

Clean Text

  • Clean text by removing emails,numbers,stopwords,etc
>>> from neattext import TextCleaner
>>> docx = TextCleaner()
>>> docx.text = "your text goes here"
>>> docx.clean_text()

Remove Emails,Numbers,Phone Numbers

>>> docx.remove_emails()
>>> docx.remove_numbers()
>>> docx.remove_phone_numbers()
>>> docx.remove_stopwords()

Remove Special Characters

>>> docx.remove_special_characters()

Replace Emails,Numbers,Phone Numbers

>>> docx.replace_emails()
>>> docx.replace_numbers()
>>> docx.replace_phone_numbers()

Using TextExtractor

  • To Extract emails,phone numbers,numbers from text
>>> from neattext import TextExtractor
>>> docx = TextExtractor()
>>> docx.text = "your text with example@gmail.com goes here"
>>> docx.extract_emails()

More Features To Add

  • unicode explainer
  • currency normalizer

By

  • Jesse E.Agbe(JCharis)
  • Jesus Saves @JCharisTech

NB

  • Contributions Are Welcomed
  • Notice a bug, please let us know.
  • Thanks A lot

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

neattext-0.0.1.tar.gz (4.4 kB view hashes)

Uploaded Source

Built Distribution

neattext-0.0.1-py3-none-any.whl (4.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page