Skip to main content

NLP Text perprocessor

Project description

TextPreprocessing

This is the beta release of TextPreprocessing library. This library currenly capable of cleansing your text data for modal training.

TextPreprocessing library can do the below actions:

* Expand general abbreviations

* Clear email ids in the text data

* Clear web URLs

* Clear html tags present in the text dataset

* Clear ordianl, cardinal numbers, date, person name, etc.

* Clear gibberish charsets

* Removes english stop words

* Lemetize the text

* Correct spelling errors.

We are enhancing this package on a regular basis and adding more flexible components to it in the upcoming releases. Please do update this package on frequently.

How to install this package?


pip install TextPreprocessing



  • Post installing the package, please install spacy en_core_web_md library.

python spacy download en_core_web_md

How to use the package?


>>> from TextPreprocessing.preprocess import preprocess

>>> obj = preprocess()

>>> obj.__version__

'2.5.0'

>>> entities_ignore_list = ["PERSON", "ORDINAL", "CARDINAL", "DATE", "TIME", "PRODUCT"]

>>> obj.load_language_model("en_core_web_md") // Add any spacy language model

>>> obj.set_entity_ignore_list(entities_ignore_list) // Add the list of entities needs to excluded from your dataset.

>>> texts = ["""Good morning Natalia A, here are the detailed items listed below:  Loan ID: PNIO-UIZD Store Invoice: NOT SENT (ONLY Affirm Transaction ID) Transaction ID: 0011473451  Please let me know of any other info needed to assist you in regards to this unfulfilled purchase.  Sincerely,  William Langford  Again, here is the credentials required to process my request, reference #: 210916-000538""", """Adam,  Sorry, I was out of town with limited network access but getting back to you now.  I don’t think I made myself clear enough before, I have had MyC loud Version 5 installed for over a year on the PR4100 but, there was a new version 5 update automatically installed on October 30, 2021 with version 5.18.117 and since that update, we have not been able to access the PR4100 locally through the network.  We can however access the PR4100 using the mycloud.com web access.  The PR4100 is connected to the internet with the Google Nest router and we are able to access the dashboard pages with no issues.  Please find the attached error we get after accessing the PR4100 a second time after reboot and any time thereafter.  Thank you in advance.  Brent D. Beck"""]

>>> obj.reset_start()

>>> results = list(map(lambda x: obj.cleanup(x, len(texts)), texts))

1/2

2/2



>>> print(results)

Output


['good morning detailed item list loan i d no did store voice send affirm transaction i d transaction i d 0011473451 let know into need assist regard unfulfilled purchase sincerely credentials require process request reference 210916 0538', 'sorry town limit network access get donaTMt think clear my loud version instal year new version update automatically instal version 18 117 update able access locally network access cloud com web access connect internet goose nest outer able access dashboard page issue find attach error access time report time thank advance']



Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

TextPreprocessing-2.6.0.tar.gz (7.8 kB view details)

Uploaded Source

Built Distribution

TextPreprocessing-2.6.0-py3-none-any.whl (7.8 kB view details)

Uploaded Python 3

File details

Details for the file TextPreprocessing-2.6.0.tar.gz.

File metadata

  • Download URL: TextPreprocessing-2.6.0.tar.gz
  • Upload date:
  • Size: 7.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.5.0 pkginfo/1.8.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.9.7

File hashes

Hashes for TextPreprocessing-2.6.0.tar.gz
Algorithm Hash digest
SHA256 50ff18a8f34fde7227116cf3c94ac9b82fdd0cfee1e0d77cfdc7fe9a4dc1e6ab
MD5 9e33d9392865d4c7b973324eb320f9b8
BLAKE2b-256 0fe5315f60d6d4ee4a4e5169f8e23bb06b40ab2e9c922296217750f5d188267f

See more details on using hashes here.

File details

Details for the file TextPreprocessing-2.6.0-py3-none-any.whl.

File metadata

  • Download URL: TextPreprocessing-2.6.0-py3-none-any.whl
  • Upload date:
  • Size: 7.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.5.0 pkginfo/1.8.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.9.7

File hashes

Hashes for TextPreprocessing-2.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 50db50a2c8c857f7f7cde8a8646f0ceabcf399b30ff5dbbe41ea429c47ff5e51
MD5 2eda360df8689ec8261e14c4d5ce4bf3
BLAKE2b-256 0b0c82e61e7f9eb974113cbf1b38ef6f7b9e23e05e72997adaab7e07c1ae5f58

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page