Skip to main content

Neattext - a simple NLP package for cleaning text

Project description


NeatText:a simple NLP package for cleaning textual data and text preprocessing

Build Status

GitHub license


  • Cleaning of unstructured text data
  • Reduce noise [special characters,stopwords]
  • Reducing repetition of using the same code for text preprocessing


  • convert the already known solution for cleaning text into a reuseable package


pip install neattext


  • The OOP Way(Object Oriented Way)

Clean Text

  • Clean text by removing emails,numbers,stopwords,emojis,etc
>>> from neattext import TextCleaner
>>> docx = TextCleaner()
>>> docx.text = "This is the mail ,our WEBSITE is 😊."
>>> docx.clean_text()

Remove Emails,Numbers,Phone Numbers

>>> docx.remove_emails()
>>> 'This is the mail  ,our WEBSITE is 😊.'
>>> docx.remove_stopwords()
>>> 'This mail ,our WEBSITE 😊.'
>>> docx.remove_numbers()
>>> docx.remove_phone_numbers()

Remove Special Characters

>>> docx.remove_special_characters()

Remove Emojis

>>> docx.remove_emojis()
>>> 'This is the mail ,our WEBSITE is .'

Replace Emails,Numbers,Phone Numbers

>>> docx.replace_emails()
>>> docx.replace_numbers()
>>> docx.replace_phone_numbers()

Using TextExtractor

  • To Extract emails,phone numbers,numbers,urls,emojis from text
>>> from neattext import TextExtractor
>>> docx = TextExtractor()
>>> docx.text = "This is the mail ,our WEBSITE is 😊."
>>> docx.extract_emails()
>>> ['']
>>> docx.extract_emojis()
>>> ['😊']

Using TextMetrics

  • To Find the Words Stats such as counts of vowels,consonants,stopwords,word-stats
>>> from neattext import TextMetrics
>>> docx = TextMetrics()
>>> docx.text = "This is the mail ,our WEBSITE is 😊."
>>> docx.count_vowels()
>>> docx.count_consonants()
>>> docx.count_stopwords()
>>> docx.word_stats()


  • The MOP(method/function oriented way) Way
>>> from neattext.neattext import clean_text,extract_emails
>>> t1 = "This is the mail ,our WEBSITE is ."
>>> clean_text(t1,True)
>>>'this is the mail <email> ,our website is <url> .'
>>> extract_emails(t1)
>>> ['']


Please read the documentation for more information on what neattext does and how to use is for your needs.

More Features To Add

  • unicode explainer
  • currency normalizer


  • Inspired by packages like clean-text from Johannes Fillter and textify by JCharisTech


  • Contributions Are Welcomed
  • Notice a bug, please let us know.
  • Thanks A lot


  • Jesse E.Agbe(JCharis)
  • Jesus Saves @JCharisTech

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for neattext, version 0.0.2
Filename, size File type Python version Upload date Hashes
Filename, size neattext-0.0.2-py3-none-any.whl (6.2 kB) File type Wheel Python version py3 Upload date Hashes View
Filename, size neattext-0.0.2.tar.gz (6.3 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page