Python package to clean strings and making them reasonable for NLP
Project description
cleantxty
Python package to clean strings and making them reasonable for NLP.
cleantxty is a an open-source python package cleaning text from raw text format. Source code for the library can be found here.
Features
cleantxt has two main methods,
- clean: to clean raw text and return the cleaned text
- clean_words: to clean raw text and return a list of clean words
other menthods that can be used simultaneoulsy are:
- remove_link: to remove link from the text
- remove_extra_white_space: to remove extra white space from the text
- lower_text: to make case of the text to lower case
- upper_text: to make case of the text to upper case
- remove_stopwords: to remove stopwords from the text
- remove_digits: to remove digits from the text
- remove_punctuations: to remove punctuations from the text
- custom_regex: to use custom regex and appy to text
- stem_text: to stem the provided text
Installation
cleantext requires Python 3 and NLTK to execute.
To install using pip, use
pip install cleantxty
Usage
- Import the library:
import cleantxty
- Choose a method:
To return the text in a string format,
cleantxty.clean("raw_text_here")
To return a list of words from the text,
cleantxty.clean_words("raw_text_here")
To choose a specific set of cleaning operations,
cleantxty.clean("raw_text_here",
default_case= "lower", # lower by default change to upper for upper case result
regex=None # Provide custom regex to use
)
cleantxty.clean_words("raw_text_here",
default_case= "lower", # lower by default change to upper for upper case result
regex=None # Provide custom regex to use
)
Examples
import cleantxty
cleantxty.clean('This is A s$ple ? tExt3% to cleaN566556+wow8 ')
returns,
'this is a sample text to clean'
import cleantxty
cleantext.clean_words('This is A s$ample !!!! tExt3% to cleaN566556+2+59*/133')
returns,
['sampl', 'text', 'clean']
from cleantxty import clean
text = "my id, name1@dom1.com and your, name2@dom2.in"
clean(text, regex=r"[a-z0-9\.\-+_]+@[a-z0-9\.\-+_]+\.[a-z]+")
returns,
"my id, email and your, email"
License
MIT
For any questions, issues, bugs, and suggestions please visit here
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file cleantxty-0.0.5.tar.gz
.
File metadata
- Download URL: cleantxty-0.0.5.tar.gz
- Upload date:
- Size: 3.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 56b5fc4841e3433c17e41c58380837c505c0580d0ce40a9b32b1449b97c65488 |
|
MD5 | d6662272875a11f4e8e13b00a84adf6f |
|
BLAKE2b-256 | ded91f485bd9647c04a2ff175192c774f81856f37faed4b39a3908aa7dc61a9e |
File details
Details for the file cleantxty-0.0.5-py3-none-any.whl
.
File metadata
- Download URL: cleantxty-0.0.5-py3-none-any.whl
- Upload date:
- Size: 3.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | cd6035ef4199d78bc35a399c16c48a97ed1900034ae75f57e0ffb43e7bab8093 |
|
MD5 | 250b0f4bff093a45fc6ba526853e96bb |
|
BLAKE2b-256 | 152946bdadea82adc5994864ac9d4f6d45ca4e97acdd1a7829f695e265dd2574 |