Clean the text for NLP project
Project description
nlp_text_cleaner
About
This is a project developed to create a utility module for text cleaning/pre processing required in NLP projects
Installation
pip install nlp-text-cleaner
Usage
from nlp_text_cleaner import nlp_text_cleaner as cleaner
cleaned_text = cleaner.apply_stemming("I played Cricket")
There are following methods present for text cleaning.
-
split_into_sentences : A method to split text into sentences
-
split_into_words : A method to split text into words
-
lower_case_text : A method to convert text to lower case
-
remove_punctuation : A method to remove punctuations in a text
-
remove_unicode : A method to remove unicode characters in a text
-
remove_leading_trailing_whitespaces : A method to remove white spaces at the begining or end of text
-
remove_duplicate_whitespaces : A method to remove consecutive white spaces
-
detect_language : A method to detect language of text
-
correct_grammar : A method to correct spelling mistakes in a text
-
remove_stopwords : A method to remove stopwords from text with optional argument to pass our own custom stopwords.
-
apply_stemming : A method to apply stemming on text
-
apply_lammatization : A method to apply lemmatization on text
-
remove_hashtags : A method to remove hashtags in a text
-
remove_hyperlinks : A method to remove hyperlinks in a text
-
clean_html_code : A method to remove html entities like ' ,& ,< etc/
-
replace_contraction : A method to sreplace contractions like n't,'ll etc
-
get_pos_tags : A method to get POS tags of text
You can use above methods as per requirement of a use case. However,there are some default methods that you can use:
-
clean_single_sentence : A default method to clean single sentence
-
clean_paragraph_to_sentences : A default method to get cleaned sentences from a paragraph
-
clean_paragraph : A default method to clean complete paragraph
Contributing
Please create a Pull request on 'develop' branch.
Developer Instructions
If you are using conda then go to location of environment.yml file and run:
conda env create -f environment.yml
For pip:
pip install -r requirements.txt
Unit Testing
- Go inside 'tests' folder on command line.
- Run:
pytest -vv
Contributors
Made with contributors-img.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for nlp_text_cleaner-1.0.11-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | add1b0e5bb33840cb25eb4c6cd4ecb6c2b3d3a61bcc691051853e7863f946edf |
|
MD5 | e38cc5ddbfb84719a5e8c9e5aa3b4ec9 |
|
BLAKE2b-256 | 06d947a35e6dbdb4f2c2d4514ef992413cc52987641bed729335a90f92cfbdab |