Skip to main content

A text pre-processing package

Project description

Husky_Simplex

Text processing package

Data preprocessing is the first and most essential stage in developing a machine learning model as it affects the overall accuracy and efficiency of the outcome. Ordinary text data contains non-contextual words, noise, misspelled words, symbols, punctuations, and unnecessary syntactic connotations. To circumvent these hindrances, we need to clean raw text data into data that is acceptable for statistical and computational analysis.

The purpose of the package is to provide a one-stop platform for most of the necessary text preprocessing techniques. These steps are used to augment the computational significance of text data for Natural Language Processing tasks.

Package Functions

  1. Tokenization - Converting string input to a list of words.
  2. Word counter - Counting the total number of words in the input.
  3. Stopword removal - Removing non-contextual words that are only used for the grammatical structure.
  4. Punctuation removal - Removing punctuations.
  5. Symbol removal - Removing symbols.
  6. Stemming - Removing tense connotations.
  7. Bag of words - Quantifying words.
  8. Count vectorization - Vectorization of text based on term frequency.
  9. TF-IDF vectorization - Vectorization of text based on term frequency in relation to document frequency

Installation

pip install husky_simplex

or
git clone https://github.com/Sudhendra/Husky_Simplex.git
cd Husky_Simplex
pip install - r requirements.txt

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

husky_simplex-0.0.3.tar.gz (9.6 kB view details)

Uploaded Source

Built Distribution

husky_simplex-0.0.3-py3-none-any.whl (8.9 kB view details)

Uploaded Python 3

File details

Details for the file husky_simplex-0.0.3.tar.gz.

File metadata

  • Download URL: husky_simplex-0.0.3.tar.gz
  • Upload date:
  • Size: 9.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.10

File hashes

Hashes for husky_simplex-0.0.3.tar.gz
Algorithm Hash digest
SHA256 5121d648bddcf3ac4e38baa4fe5368d3674b14b33130953fbc59828fc6f9faed
MD5 b3416537ed004d9d82aa167127b42bce
BLAKE2b-256 ad419d6081f5dfb4c35f79d13b99554c9c86f268b8962ba38c4c694564511205

See more details on using hashes here.

File details

Details for the file husky_simplex-0.0.3-py3-none-any.whl.

File metadata

File hashes

Hashes for husky_simplex-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 9adf35240cf7e4942319ca646085a0183a0086de1806f30a009211d5001fa590
MD5 1ba2afdb9717ba81ebf47f6a100820d5
BLAKE2b-256 1b9e024e7b0522789fc9856dcb2b1997a682c6244dd17018e8070e9903da1435

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page