This is my text processor that will process text using 10 NLP processing tehcniques for the ML SPECIALIZATION
Project description
TextPreprocessor
Overview
The TextPreprocessor is a Python class designed for comprehensive text preprocessing. It facilitates tasks like removing links, hashtags, special characters, emojis, numbers, and stopwords. Additionally, it provides functionality for converting text to lowercase.
Installation
Ensure you have NLTK installed. You can install NLTK via pip:
pip install nltk
Usage
### Import the TextPreprocessor class
from text_preprocessor import TextPreprocessor
# Initialize the preprocessor with default settings
preprocessor = TextPreprocessor()
# Customize the preprocessor by setting flags
preprocessor = TextPreprocessor(
remove_links=True,
remove_hashtags=True,
remove_characters=True,
convert_to_lowercase=True,
remove_emojis=True,
remove_numbers=True,
remove_stopwords_flag=True
)
text = "Your text goes here..."
processed_text = preprocessor.preprocess_text(text)
Available Methods
- preprocess_text(text): Preprocesses the input text based on the initialized flags.
- Other methods in the class can be used individually for specific preprocessing steps (e.g., remove_links, remove_stopwords, etc.).
Examples
text = "Hello! This is an example text with #hashtags and links: https://example.com"
# Initialize preprocessor
preprocessor = TextPreprocessor(remove_links=True, remove_hashtags=True)
# Preprocess text
processed_text = preprocessor.preprocess_text(text)
print(processed_text)
Output: "Hello This is an example text with and links"
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tweetprocessor-1.0.0.tar.gz.
File metadata
- Download URL: tweetprocessor-1.0.0.tar.gz
- Upload date:
- Size: 3.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.12.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d03e6ddff83240a5abf8abc0832c193531a8cae8b0fb102bcdf945379efdcc58
|
|
| MD5 |
1a2fa9e18e070a5cc9b44a76f19ffbfe
|
|
| BLAKE2b-256 |
e86a13ca0be1560f5d98497ccd9c0ff2a85e2ca17a4420fb8b7f6bb96b5e2111
|
File details
Details for the file tweetprocessor-1.0.0-py3-none-any.whl.
File metadata
- Download URL: tweetprocessor-1.0.0-py3-none-any.whl
- Upload date:
- Size: 3.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.12.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fe00ef8b0092155a99e0a1fdde4175a7f720619abc7a2b85c1f61038ec77d713
|
|
| MD5 |
53c0052deeb1ad98044d5290b731c660
|
|
| BLAKE2b-256 |
18fc7720fa546ed7c0dc4d6cf0662513ff2118a719d9960f5773412ae3a1fcf8
|