Skip to main content

Clean the text for NLP project

Project description

nlp_text_cleaner

About

This is a project developed to create a utility module for text cleaning/pre processing required in NLP projects

Installation

pip install nlp-text-cleaner

Usage

    from nlp_text_cleaner import nlp_text_cleaner as cleaner
    cleaned_text = cleaner.apply_stemming("I played Cricket")

There are following methods present for text cleaning.

  • split_into_sentences : A method to split text into sentences

  • split_into_words : A method to split text into words

  • lower_case_text : A method to convert text to lower case

  • remove_punctuation : A method to remove punctuations in a text

  • remove_unicode : A method to remove unicode characters in a text

  • remove_leading_trailing_whitespaces : A method to remove white spaces at the begining or end of text

  • remove_duplicate_whitespaces : A method to remove consecutive white spaces

  • detect_language : A method to detect language of text

  • correct_grammar : A method to correct spelling mistakes in a text

  • remove_stopwords : A method to remove stopwords from text with optional argument to pass our own custom stopwords.

  • apply_stemming : A method to apply stemming on text

  • apply_lammatization : A method to apply lemmatization on text

  • remove_hashtags : A method to remove hashtags in a text

  • remove_hyperlinks : A method to remove hyperlinks in a text

  • clean_html_code : A method to remove html entities like ' ,& ,< etc/

  • replace_contraction : A method to sreplace contractions like n't,'ll etc

  • get_pos_tags : A method to get POS tags of text

You can use above methods as per requirement of a use case. However,there are some default methods that you can use:

  • clean_single_sentence : A default method to clean single sentence

  • clean_paragraph_to_sentences : A default method to get cleaned sentences from a paragraph

  • clean_paragraph : A default method to clean complete paragraph

Contributing

Please create a Pull request on 'develop' branch.

Developer Instructions

If you are using conda then go to location of environment.yml file and run:

conda env create -f environment.yml     

For pip:

pip install -r requirements.txt     

Unit Testing

  1. Go inside 'tests' folder on command line.
  2. Run:
    pytest -vv 

Contributors

Made with contributors-img.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nlp_text_cleaner-1.0.11.tar.gz (4.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nlp_text_cleaner-1.0.11-py3-none-any.whl (5.3 kB view details)

Uploaded Python 3

File details

Details for the file nlp_text_cleaner-1.0.11.tar.gz.

File metadata

  • Download URL: nlp_text_cleaner-1.0.11.tar.gz
  • Upload date:
  • Size: 4.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.13

File hashes

Hashes for nlp_text_cleaner-1.0.11.tar.gz
Algorithm Hash digest
SHA256 74a2ddc56c27c24d5349aa1d70508dd4a7786ed3ddcce1af3423eca3100f76c5
MD5 e21abeb8f4997630aa43c3bccbf4d2f4
BLAKE2b-256 103bd4956f34aab8a242aeb28db1d45c7e488e956c8e58475a15a9905b4d9ae8

See more details on using hashes here.

File details

Details for the file nlp_text_cleaner-1.0.11-py3-none-any.whl.

File metadata

File hashes

Hashes for nlp_text_cleaner-1.0.11-py3-none-any.whl
Algorithm Hash digest
SHA256 add1b0e5bb33840cb25eb4c6cd4ecb6c2b3d3a61bcc691051853e7863f946edf
MD5 e38cc5ddbfb84719a5e8c9e5aa3b4ec9
BLAKE2b-256 06d947a35e6dbdb4f2c2d4514ef992413cc52987641bed729335a90f92cfbdab

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page