Clean and prepare text for modeling with machine learning
Project description
Nlpcleaner
Clean and prepare text for modeling with machine learning.
- lower all
- strip all
- remove numbers
- remove symbols
- remove url
- strip html tags
- remove stopwords by detected language or passed language
- lemming or stemming
Usage
from nlpcleaner import TextCleaner
TextCleaner(txt).clean()
Tests
pipenv install .
python setup.py test
Push on PyPi
python setup.py sdist
pip install twine
twine upload dist/*
TODO
- Add tests to cover different cases and languages;
- check performances
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
nlpcleaner-0.3.1.tar.gz
(17.5 MB
view details)
File details
Details for the file nlpcleaner-0.3.1.tar.gz
.
File metadata
- Download URL: nlpcleaner-0.3.1.tar.gz
- Upload date:
- Size: 17.5 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.40.2 CPython/3.7.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 268aa5ef03b1a4e09b22523e2bd25ac26cdbd142252a76034e77a05398845fc2 |
|
MD5 | b2cae118ae675e5b87c228b0c2ec8809 |
|
BLAKE2b-256 | 979856e2e5e62b4f7e2f6e1199487646f091696c4f8d13d63d86d0aa1543b910 |