Skip to main content

A text pre-processing package

Project description

Husky_Simplex

Text processing package

Data preprocessing is the first and most essential stage in developing a machine learning model as it affects the overall accuracy and efficiency of the outcome. Ordinary text data contains non-contextual words, noise, misspelled words, symbols, punctuations, and unnecessary syntactic connotations. To circumvent these hindrances, we need to clean raw text data into data that is acceptable for statistical and computational analysis.

The purpose of the package is to provide a one-stop platform for most of the necessary text preprocessing techniques. These steps are used to augment the computational significance of text data for Natural Language Processing tasks.

Package Functions

  1. Tokenization - Converting string input to a list of words.
  2. Word counter - Counting the total number of words in the input.
  3. Stopword removal - Removing non-contextual words that are only used for the grammatical structure.
  4. Punctuation removal - Removing punctuations.
  5. Symbol removal - Removing symbols.
  6. Stemming - Removing tense connotations.
  7. Bag of words - Quantifying words.
  8. Count vectorization - Vectorization of text based on term frequency.
  9. TF-IDF vectorization - Vectorization of text based on term frequency in relation to document frequency

Installation

pip install husky_simplex

or git clone https://github.com/Sudhendra/Husky_Simplex.git cd Husky_Simplex ``` pip install - r requirements.txt ``

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

husky_simplex-0.0.2.tar.gz (9.6 kB view hashes)

Uploaded Source

Built Distribution

husky_simplex-0.0.2-py3-none-any.whl (8.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page