Skip to main content

Quicksample Test Package for SQLShack Demo

Project description

cleantxty

Python package to clean strings and making them reasonable for NLP.

cleantxty is a an open-source python package cleaning text from raw text format. Source code for the library can be found here.

Features

cleantxt has two main methods,

  • clean: to clean raw text and return the cleaned text
  • clean_words: to clean raw text and return a list of clean words

other menthods that can be used simultaneoulsy are:

  • remove_link: to remove link from the text
  • remove_extra_white_space: to remove extra white space from the text
  • lower_text: to make case of the text to lower case
  • upper_text: to make case of the text to upper case
  • remove_stopwords: to remove stopwords from the text
  • remove_digits: to remove digits from the text
  • remove_punctuations: to remove punctuations from the text
  • custom_regex: to use custom regex and appy to text
  • stem_text: to stem the provided text

Installation

cleantext requires Python 3 and NLTK to execute.

To install using pip, use

pip install cleantxty

Usage

  • Import the library:
import cleantxty
  • Choose a method:

To return the text in a string format,

cleantxty.clean("raw_text_here") 

To return a list of words from the text,

cleantxty.clean_words("raw_text_here") 

To choose a specific set of cleaning operations,

cleantxty.clean("raw_text_here",
default_case= "lower", # lower by default change to upper for upper case result
regex=None  # Provide custom regex to use
)

cleantxty.clean_words("raw_text_here",
default_case= "lower", # lower by default change to upper for upper case result
regex=None  # Provide custom regex to use
)

Examples

import cleantxty
cleantxty.clean('This is A s$ple ? tExt3% to   cleaN566556+wow8 ')

returns,

'this is a sample text to clean'

import cleantxty
cleantext.clean_words('This is A s$ample !!!! tExt3% to   cleaN566556+2+59*/133')

returns,

['sampl', 'text', 'clean']

from cleantxty import clean
text = "my id, name1@dom1.com and your, name2@dom2.in"
clean(text, regex=r"[a-z0-9\.\-+_]+@[a-z0-9\.\-+_]+\.[a-z]+")

returns,

"my id, email and your, email"

License

MIT

For any questions, issues, bugs, and suggestions please visit here

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cleantxty-0.0.1.tar.gz (3.2 kB view hashes)

Uploaded Source

Built Distribution

cleantxty-0.0.1-py3-none-any.whl (3.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page