A robust NLP pipeline for stemming, lemmatization, and vectorization
Project description
NLPProcessor
Overview
NLPProcessor is an automated, adaptive NLP pipeline that dynamically handles:
- Tokenization (Word & Sentence)
- Stopword Removal
- POS Tagging
- Named Entity Recognition (NER)
- Text Normalization (Lowercasing, Punctuation Removal, etc.)
- Stemming & Lemmatization (via NLTK or spaCy)
- Vectorization (TF-IDF or Count Vectorizer)
- Dependency Management (Auto-installs missing libraries)
- Support for 2D Text Arrays (Processes lists of lists of text)
- Exception-Free Execution (Handles API changes without breaking)
Features
- Automated dependency installation
- Works with both NLTK and spaCy
- Vectorization support using scikit-learn
- Handles single strings and 2D arrays
- No human intervention required
Installation
Run the following command to install missing dependencies:
python your_script.py
Usage
Import and Initialize
from your_script import NLPProcessor
processor = NLPProcessor(stem=True, lemmatize=True, vectorize="tfidf", backend="spacy")
Process a Single Text
output = processor.process("running jumped swimming")
print(output)
Process a 2D Array of Text
input_texts = [
["I am running", "He is jumping"],
["They are swimming", "Dogs are barking"]
]
output = processor.process(input_texts)
print(output)
Customization Options
| Parameter | Description |
|---|---|
stem |
Enable stemming (default: False) |
lemmatize |
Enable lemmatization (default: False) |
vectorize |
Choose "tfidf", "count", or None (default: None) |
tokenize |
Enable word/sentence tokenization (default: False) |
remove_stopwords |
Remove stopwords (default: False) |
pos_tagging |
Enable Part-of-Speech tagging (default: False) |
ner |
Enable Named Entity Recognition (default: False) |
normalize |
Lowercase and remove punctuation (default: False) |
backend |
Choose "nltk" or "spacy" (default: "nltk") |
Check Supported Vectorizers
print(NLPProcessor.supported_vectorizers()) # ['tfidf', 'count']
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pun_nlp-0.0.4.tar.gz
(4.3 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pun_nlp-0.0.4.tar.gz.
File metadata
- Download URL: pun_nlp-0.0.4.tar.gz
- Upload date:
- Size: 4.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f8ebac21a8762e798910fa77e7d05fe69ea62df0cfe170bde986388bf140fc6a
|
|
| MD5 |
ad5e378545da08c8ab2dda72d36ebe81
|
|
| BLAKE2b-256 |
f4b4df990ef4e107b88ee3807602ff4f93e10f19ea7f1eb346989dc1ce1e1d5a
|
File details
Details for the file pun_nlp-0.0.4-py3-none-any.whl.
File metadata
- Download URL: pun_nlp-0.0.4-py3-none-any.whl
- Upload date:
- Size: 4.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
85079e41ef443c71dbdeecef41a0688aaa3d9b3a0a08afd3379eb3d763c77849
|
|
| MD5 |
486c36c9ab7108913227ad05ecdc3bd3
|
|
| BLAKE2b-256 |
c5282d570387cce04e52780319cabc88531bf13c0d223cbe1e0af34fd5dd0775
|