[website](https://yugantm.github.io/textcleaner/)- for more
Project description
# textcleaner V0.4.6
Text-Cleaner is a utility library for text-data pre-processing. Use it before passing the text data to a model.
# Features!
- main_cleaner to do all the below in one call !
or
- remove unnecessary blank lines
- stip out a perticular character or default one
- transfer all characters to lowercase if needed
- remove numbers, symblos and stop-words from the whole text
- tokenize the text-data on one call
- stemming & lemmatization powered by NLTK
> The goal is to make basic cleaning of data hassle free.
> Most of the developers who are working with text data have
> faced this situation where data is not consumable
> and they end up wasting their time on these issues
> rather than fine tunning the model and get better accuracy.
> In that scenario this library can be useful and save you a tone
> of time.
### Tech
textcleaner uses a number of open source projects to work properly:
- [NLTK](https://www.nltk.org/) - for advanced cleaning
- [REGEX](https://pypi.org/project/regex/) - for regular expression
And of course textcleaner itself is open source with a [public repository](https://github.com/YugantM/textcleaner)
on GitHub.
### Installation
textcleaner requires [Python 3.x](https://www.python.org/downloads/) to run.
Install the dependencies if you have not already installed it!
- NLTK : steps to install [[documentation](https://www.nltk.org/install.html)]
- REGEX :
```sh
pip install regex
```
- textcleaner :
```sh
pip install textcleaner
```
or
```sh
pip install textcleaner==0.4.6
```
### Usage
```python
import textcleaner as tc
tc.main_cleaner('<FILE_NAME>')
#or
tc.document('<FILE_NAME>')
```
Above command will convert the text file into list of words with cleaning. Default response of the function is list of list use *op* argument and set it to 'words' and you will get a flat list of words.
### Todos
- more advanced features
- ability to read more formats rather than only .txt
License
----
MIT
**Free Software, Hell Yeah!**
Text-Cleaner is a utility library for text-data pre-processing. Use it before passing the text data to a model.
# Features!
- main_cleaner to do all the below in one call !
or
- remove unnecessary blank lines
- stip out a perticular character or default one
- transfer all characters to lowercase if needed
- remove numbers, symblos and stop-words from the whole text
- tokenize the text-data on one call
- stemming & lemmatization powered by NLTK
> The goal is to make basic cleaning of data hassle free.
> Most of the developers who are working with text data have
> faced this situation where data is not consumable
> and they end up wasting their time on these issues
> rather than fine tunning the model and get better accuracy.
> In that scenario this library can be useful and save you a tone
> of time.
### Tech
textcleaner uses a number of open source projects to work properly:
- [NLTK](https://www.nltk.org/) - for advanced cleaning
- [REGEX](https://pypi.org/project/regex/) - for regular expression
And of course textcleaner itself is open source with a [public repository](https://github.com/YugantM/textcleaner)
on GitHub.
### Installation
textcleaner requires [Python 3.x](https://www.python.org/downloads/) to run.
Install the dependencies if you have not already installed it!
- NLTK : steps to install [[documentation](https://www.nltk.org/install.html)]
- REGEX :
```sh
pip install regex
```
- textcleaner :
```sh
pip install textcleaner
```
or
```sh
pip install textcleaner==0.4.6
```
### Usage
```python
import textcleaner as tc
tc.main_cleaner('<FILE_NAME>')
#or
tc.document('<FILE_NAME>')
```
Above command will convert the text file into list of words with cleaning. Default response of the function is list of list use *op* argument and set it to 'words' and you will get a flat list of words.
### Todos
- more advanced features
- ability to read more formats rather than only .txt
License
----
MIT
**Free Software, Hell Yeah!**
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
textcleaner-0.4.6.tar.gz
(4.6 kB
view details)
File details
Details for the file textcleaner-0.4.6.tar.gz
.
File metadata
- Download URL: textcleaner-0.4.6.tar.gz
- Upload date:
- Size: 4.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.18.4 setuptools/38.5.1 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a60683cb177d74604be07e8365ead008a45f029f7e91d812438712d2414ff6d2 |
|
MD5 | 784b48cab1bbd6de1417b296e54e2e7f |
|
BLAKE2b-256 | e673021a560a875b8d411e4aa0e47fd09273083170561cfe283dc96c12ad87f7 |