Skip to main content

Clean and prepare text for modeling with machine learning

Project description

Nlpcleaner Build Status

Clean and prepare text for modeling with machine learning.

  • lower all
  • strip all
  • remove numbers
  • remove symbols
  • remove url
  • strip html tags
  • remove stopwords by detected language or passed language
  • lemming or stemming

Usage

from nlpcleaner import TextCleaner
TextCleaner(txt).clean()

Tests

pipenv install .
python setup.py test

Push on PyPi

python setup.py sdist
pip install twine
twine upload dist/*

TODO

  • Add tests to cover different cases and languages;
  • check performances

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nlpcleaner-0.3.1.tar.gz (17.5 MB view details)

Uploaded Source

File details

Details for the file nlpcleaner-0.3.1.tar.gz.

File metadata

  • Download URL: nlpcleaner-0.3.1.tar.gz
  • Upload date:
  • Size: 17.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.40.2 CPython/3.7.5

File hashes

Hashes for nlpcleaner-0.3.1.tar.gz
Algorithm Hash digest
SHA256 268aa5ef03b1a4e09b22523e2bd25ac26cdbd142252a76034e77a05398845fc2
MD5 b2cae118ae675e5b87c228b0c2ec8809
BLAKE2b-256 979856e2e5e62b4f7e2f6e1199487646f091696c4f8d13d63d86d0aa1543b910

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page