Clean and prepare text for modeling with machine learning
Project description
Nlpcleaner
Clean and prepare text for modeling with machine learning.
- lower all
- strip all
- remove numbers
- remove symbols
- remove url
- strip html tags
- remove stopwords by detected language or passed language
- lemming or stemming
Usage
from nlpcleaner import TextCleaner
TextCleaner(txt).clean()
Tests
pipenv install .
python setup.py test
Push on PyPi
python setup.py sdist
pip install twine
twine upload dist/*
TODO
- Add tests to cover different cases and languages;
- check performances
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
nlpcleaner-0.3.1.tar.gz
(17.5 MB
view hashes)