A robust NLP pipeline for stemming, lemmatization, and vectorization
Project description
NLPProcessor
Overview
NLPProcessor is an automated, adaptive NLP pipeline that dynamically handles:
- Tokenization (Word & Sentence)
- Stopword Removal
- POS Tagging
- Named Entity Recognition (NER)
- Text Normalization (Lowercasing, Punctuation Removal, etc.)
- Stemming & Lemmatization (via NLTK or spaCy)
- Vectorization (TF-IDF or Count Vectorizer)
- Dependency Management (Auto-installs missing libraries)
- Support for 2D Text Arrays (Processes lists of lists of text)
- Exception-Free Execution (Handles API changes without breaking)
Features
- Automated dependency installation
- Works with both NLTK and spaCy
- Vectorization support using scikit-learn
- Handles single strings and 2D arrays
- No human intervention required
Installation
Run the following command to install missing dependencies:
pip install pun_nlp
Usage
Import and Initialize
from pun_nlp import NLPProcessor
processor = NLPProcessor(stem=True, lemmatize=True, vectorize="tfidf", backend="spacy")
Process a Single Text
output = processor.process("running jumped swimming")
print(output)
Process a 2D Array of Text
input_texts = [
["I am running", "He is jumping"],
["They are swimming", "Dogs are barking"]
]
output = processor.process(input_texts)
print(output)
Customization Options
| Parameter | Description |
|---|---|
stem |
Enable stemming (default: False) |
lemmatize |
Enable lemmatization (default: False) |
vectorize |
Choose "tfidf", "count", or None (default: None) |
tokenize |
Enable word/sentence tokenization (default: False) |
remove_stopwords |
Remove stopwords (default: False) |
pos_tagging |
Enable Part-of-Speech tagging (default: False) |
ner |
Enable Named Entity Recognition (default: False) |
normalize |
Lowercase and remove punctuation (default: False) |
backend |
Choose "nltk" or "spacy" (default: "nltk") |
Check Supported Vectorizers
print(NLPProcessor.supported_vectorizers()) # ['tfidf', 'count']
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pun_nlp-0.0.5.tar.gz
(4.3 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pun_nlp-0.0.5.tar.gz.
File metadata
- Download URL: pun_nlp-0.0.5.tar.gz
- Upload date:
- Size: 4.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a8459dadb0476a00208b2fe5277ec463632e595dff1a075603d2d830debf67bd
|
|
| MD5 |
1910c1393fc38d14abb934120e592a24
|
|
| BLAKE2b-256 |
838ccbdad6ebd634df76043b4658919a33f65825b7a9affd6921ea47c82556b2
|
File details
Details for the file pun_nlp-0.0.5-py3-none-any.whl.
File metadata
- Download URL: pun_nlp-0.0.5-py3-none-any.whl
- Upload date:
- Size: 4.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
260955473ed9bffe2f55facd8909f800ef6d903ad14a20ce72edf01c21bc4c33
|
|
| MD5 |
fcded2c08dc76f9c5a314a7006ebd857
|
|
| BLAKE2b-256 |
a725b62afd589ac3fab5ace2afa51de3fdf2763c170769352dfc8f44437cfc80
|