Skip to main content

Python package to clean strings and making them reasonable for NLP

Project description

cleantxty

Downloads

Python package to clean strings and making them reasonable for NLP.

cleantxty is a an open-source python package cleaning text from raw text format. Source code for the library can be found here.

Features

cleantxt has two main methods,

  • clean: to clean raw text and return the cleaned text
  • clean_words: to clean raw text and return a list of clean words

other menthods that can be used simultaneoulsy are:

  • remove_link: to remove link from the text
  • remove_extra_white_space: to remove extra white space from the text
  • lower_text: to make case of the text to lower case
  • upper_text: to make case of the text to upper case
  • remove_stopwords: to remove stopwords from the text
  • remove_digits: to remove digits from the text
  • remove_punctuations: to remove punctuations from the text
  • custom_regex: to use custom regex and appy to text
  • stem_text: to stem the provided text

Installation

cleantext requires Python 3 and NLTK to execute.

To install using pip, use

pip install cleantxty

Usage

  • Import the library:
import cleantxty
  • Choose a method:

To return the text in a string format,

cleantxty.clean("raw_text_here") 

To return a list of words from the text,

cleantxty.clean_words("raw_text_here") 

To choose a specific set of cleaning operations,

cleantxty.clean("raw_text_here",
default_case= "lower", # lower by default change to upper for upper case result
regex=None  # Provide custom regex to use
)

cleantxty.clean_words("raw_text_here",
default_case= "lower", # lower by default change to upper for upper case result
regex=None  # Provide custom regex to use
)

Examples

import cleantxty
cleantxty.clean('This is A s$ple ? tExt3% to   cleaN566556+wow8 ')

returns,

'this is a sample text to clean'

import cleantxty
cleantext.clean_words('This is A s$ample !!!! tExt3% to   cleaN566556+2+59*/133')

returns,

['sampl', 'text', 'clean']

from cleantxty import clean
text = "my id, name1@dom1.com and your, name2@dom2.in"
clean(text, regex=r"[a-z0-9\.\-+_]+@[a-z0-9\.\-+_]+\.[a-z]+")

returns,

"my id, email and your, email"

License

MIT

For any questions, issues, bugs, and suggestions please visit here

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cleantxty-0.0.5.tar.gz (3.9 kB view details)

Uploaded Source

Built Distribution

cleantxty-0.0.5-py3-none-any.whl (3.8 kB view details)

Uploaded Python 3

File details

Details for the file cleantxty-0.0.5.tar.gz.

File metadata

  • Download URL: cleantxty-0.0.5.tar.gz
  • Upload date:
  • Size: 3.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for cleantxty-0.0.5.tar.gz
Algorithm Hash digest
SHA256 56b5fc4841e3433c17e41c58380837c505c0580d0ce40a9b32b1449b97c65488
MD5 d6662272875a11f4e8e13b00a84adf6f
BLAKE2b-256 ded91f485bd9647c04a2ff175192c774f81856f37faed4b39a3908aa7dc61a9e

See more details on using hashes here.

File details

Details for the file cleantxty-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: cleantxty-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 3.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for cleantxty-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 cd6035ef4199d78bc35a399c16c48a97ed1900034ae75f57e0ffb43e7bab8093
MD5 250b0f4bff093a45fc6ba526853e96bb
BLAKE2b-256 152946bdadea82adc5994864ac9d4f6d45ca4e97acdd1a7829f695e265dd2574

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page