Skip to main content

This is my text processor that will process text using 10 NLP processing tehcniques for the ML SPECIALIZATION

Project description

TextPreprocessor

Overview

The TextPreprocessor is a Python class designed for comprehensive text preprocessing. It facilitates tasks like removing links, hashtags, special characters, emojis, numbers, and stopwords. Additionally, it provides functionality for converting text to lowercase.

Installation

Ensure you have NLTK installed. You can install NLTK via pip:

pip install nltk

Usage

### Import the TextPreprocessor class
from text_preprocessor import TextPreprocessor
# Initialize the preprocessor with default settings
preprocessor = TextPreprocessor()

# Customize the preprocessor by setting flags
preprocessor = TextPreprocessor(
    remove_links=True,
    remove_hashtags=True,
    remove_characters=True,
    convert_to_lowercase=True,
    remove_emojis=True,
    remove_numbers=True,
    remove_stopwords_flag=True
)



text = "Your text goes here..."


processed_text = preprocessor.preprocess_text(text)

Available Methods

  1. preprocess_text(text): Preprocesses the input text based on the initialized flags.
  2. Other methods in the class can be used individually for specific preprocessing steps (e.g., remove_links, remove_stopwords, etc.).

Examples

text = "Hello! This is an example text with #hashtags and links: https://example.com"



# Initialize preprocessor
preprocessor = TextPreprocessor(remove_links=True, remove_hashtags=True)

# Preprocess text
processed_text = preprocessor.preprocess_text(text)
print(processed_text)
Output: "Hello This is an example text with and links"

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tweetprocessor-1.0.0.tar.gz (3.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tweetprocessor-1.0.0-py3-none-any.whl (3.9 kB view details)

Uploaded Python 3

File details

Details for the file tweetprocessor-1.0.0.tar.gz.

File metadata

  • Download URL: tweetprocessor-1.0.0.tar.gz
  • Upload date:
  • Size: 3.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.12.0

File hashes

Hashes for tweetprocessor-1.0.0.tar.gz
Algorithm Hash digest
SHA256 d03e6ddff83240a5abf8abc0832c193531a8cae8b0fb102bcdf945379efdcc58
MD5 1a2fa9e18e070a5cc9b44a76f19ffbfe
BLAKE2b-256 e86a13ca0be1560f5d98497ccd9c0ff2a85e2ca17a4420fb8b7f6bb96b5e2111

See more details on using hashes here.

File details

Details for the file tweetprocessor-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: tweetprocessor-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 3.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.12.0

File hashes

Hashes for tweetprocessor-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fe00ef8b0092155a99e0a1fdde4175a7f720619abc7a2b85c1f61038ec77d713
MD5 53c0052deeb1ad98044d5290b731c660
BLAKE2b-256 18fc7720fa546ed7c0dc4d6cf0662513ff2118a719d9960f5773412ae3a1fcf8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page