Neattext - a simple NLP package for cleaning text

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

neattext

NeatText:a simple NLP package for cleaning textual data and text preprocessing

Problem

Cleaning of unstructured text data
Reduce noise [special characters,stopwords]
Reducing repetition of using the same code for text preprocessing

Solution

convert the already known solution for cleaning text into a reuseable package

Installation

pip install neattext

Usage

The OOP Way(Object Oriented Way)

Clean Text

Clean text by removing emails,numbers,stopwords,emojis,etc

>>> from neattext import TextCleaner
>>> docx = TextCleaner()
>>> docx.text = "This is the mail example@gmail.com ,our WEBSITE is https://example.com 😊."
>>> docx.clean_text()

Remove Emails,Numbers,Phone Numbers

>>> docx.remove_emails()
>>> 'This is the mail  ,our WEBSITE is https://example.com 😊.'
>>>
>>> docx.remove_stopwords()
>>> 'This mail example@gmail.com ,our WEBSITE https://example.com 😊.'
>>>
>>> docx.remove_numbers()
>>> docx.remove_phone_numbers()

Remove Special Characters

>>> docx.remove_special_characters()

Remove Emojis

>>> docx.remove_emojis()
>>> 'This is the mail example@gmail.com ,our WEBSITE is https://example.com .'

Replace Emails,Numbers,Phone Numbers

>>> docx.replace_emails()
>>> docx.replace_numbers()
>>> docx.replace_phone_numbers()

Using TextExtractor

To Extract emails,phone numbers,numbers,urls,emojis from text

>>> from neattext import TextExtractor
>>> docx = TextExtractor()
>>> docx.text = "This is the mail example@gmail.com ,our WEBSITE is https://example.com 😊."
>>> docx.extract_emails()
>>> ['example@gmail.com']
>>>
>>> docx.extract_emojis()
>>> ['😊']

Using TextMetrics

To Find the Words Stats such as counts of vowels,consonants,stopwords,word-stats

>>> from neattext import TextMetrics
>>> docx = TextMetrics()
>>> docx.text = "This is the mail example@gmail.com ,our WEBSITE is https://example.com 😊."
>>> docx.count_vowels()
>>> docx.count_consonants()
>>> docx.count_stopwords()
>>> docx.word_stats()

Usage

The MOP(method/function oriented way) Way

>>> from neattext.neattext import clean_text,extract_emails
>>> t1 = "This is the mail example@gmail.com ,our WEBSITE is https://example.com ."
>>> clean_text(t1,True)
>>>'this is the mail <email> ,our website is <url> .'
>>> extract_emails(t1)
>>> ['example@gmail.com']

Documentation

Please read the documentation for more information on what neattext does and how to use is for your needs.

More Features To Add

unicode explainer
currency normalizer

Acknowledgements

Inspired by packages like clean-text from Johannes Fillter and textify by JCharisTech

NB

Contributions Are Welcomed
Notice a bug, please let us know.
Thanks A lot

By

Jesse E.Agbe(JCharis)
Jesus Saves @JCharisTech

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.1.3

Apr 9, 2022

0.1.2

Sep 29, 2021

0.1.1

Sep 11, 2021

0.1.0

Jan 6, 2021

0.0.9

Sep 17, 2020

0.0.8

Sep 6, 2020

0.0.7

Sep 3, 2020

0.0.6

Aug 8, 2020

0.0.5

Jul 28, 2020

0.0.4

Jul 25, 2020

0.0.3

Jul 13, 2020

This version

0.0.2

Mar 26, 2020

0.0.1

Mar 18, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

neattext-0.0.2.tar.gz (6.3 kB view hashes)

Uploaded Mar 26, 2020 Source

Built Distribution

neattext-0.0.2-py3-none-any.whl (6.2 kB view hashes)

Uploaded Mar 26, 2020 Python 3

Hashes for neattext-0.0.2.tar.gz

Hashes for neattext-0.0.2.tar.gz
Algorithm	Hash digest
SHA256	`cd4550f614ab5bfdc505462a55e679d419b67eb2be8689a2f4ef2fda1fadc573`
MD5	`47fa98eec9e73295418164d3963c80a9`
BLAKE2b-256	`e754fcc5d954f1eb38c49bcc5f737182887befd23e45f82390001c0b8463692e`

Hashes for neattext-0.0.2-py3-none-any.whl

Hashes for neattext-0.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c750df4a76388b94d1d023a53793f7b1197ffdaf2eb3aaf492753610c8c39887`
MD5	`8b9c61517377e6aa868ab025275a96b5`
BLAKE2b-256	`c1b45fcbd2ecf837445807b45f592f2dfb7e9586d158c2f8d0afe6866d2052af`