Clean the text for NLP project
Project description
nlp_text_cleaner
About
This is a project developed to create a utility module for text cleaning/pre processing required in NLP projects
Installation
pip install nlp-text-cleaner
Usage
from nlp_text_cleaner import nlp_text_cleaner as cleaner
cleaned_text = cleaner.apply_stemming("I played Cricket")
There are following methods present for text cleaning.
-
split_into_sentences : A method to split text into sentences
-
split_into_words : A method to split text into words
-
lower_case_text : A method to convert text to lower case
-
remove_punctuation : A method to remove punctuations in a text
-
remove_unicode : A method to remove unicode characters in a text
-
remove_leading_trailing_whitespaces : A method to remove white spaces at the begining or end of text
-
remove_duplicate_whitespaces : A method to remove consecutive white spaces
-
detect_language : A method to detect language of text
-
correct_grammar : A method to correct spelling mistakes in a text
-
remove_stopwords : A method to remove stopwords from text with optional argument to pass our own custom stopwords.
-
apply_stemming : A method to apply stemming on text
-
apply_lammatization : A method to apply lemmatization on text
-
remove_hashtags : A method to remove hashtags in a text
-
remove_hyperlinks : A method to remove hyperlinks in a text
-
clean_html_code : A method to remove html entities like ' ,& ,< etc/
-
replace_contraction : A method to sreplace contractions like n't,'ll etc
-
get_pos_tags : A method to get POS tags of text
You can use above methods as per requirement of a use case. However,there are some default methods that you can use:
-
clean_single_sentence : A default method to clean single sentence
-
clean_paragraph_to_sentences : A default method to get cleaned sentences from a paragraph
-
clean_paragraph : A default method to clean complete paragraph
Contributing
Please create a Pull request on 'develop' branch.
Developer Instructions
If you are using conda then go to location of environment.yml file and run:
conda env create -f environment.yml
For pip:
pip install -r requirements.txt
Unit Testing
- Go inside 'tests' folder on command line.
- Run:
pytest -vv
Contributors
Made with contributors-img.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nlp_text_cleaner-1.0.11.tar.gz.
File metadata
- Download URL: nlp_text_cleaner-1.0.11.tar.gz
- Upload date:
- Size: 4.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.8.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
74a2ddc56c27c24d5349aa1d70508dd4a7786ed3ddcce1af3423eca3100f76c5
|
|
| MD5 |
e21abeb8f4997630aa43c3bccbf4d2f4
|
|
| BLAKE2b-256 |
103bd4956f34aab8a242aeb28db1d45c7e488e956c8e58475a15a9905b4d9ae8
|
File details
Details for the file nlp_text_cleaner-1.0.11-py3-none-any.whl.
File metadata
- Download URL: nlp_text_cleaner-1.0.11-py3-none-any.whl
- Upload date:
- Size: 5.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.8.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
add1b0e5bb33840cb25eb4c6cd4ecb6c2b3d3a61bcc691051853e7863f946edf
|
|
| MD5 |
e38cc5ddbfb84719a5e8c9e5aa3b4ec9
|
|
| BLAKE2b-256 |
06d947a35e6dbdb4f2c2d4514ef992413cc52987641bed729335a90f92cfbdab
|