A robust NLP pipeline for stemming, lemmatization, and vectorization
Project description
NLPProcessor
Overview
NLPProcessor is an automated, adaptive NLP pipeline that dynamically handles:
- Tokenization (Word & Sentence)
- Stopword Removal
- POS Tagging
- Named Entity Recognition (NER)
- Text Normalization (Lowercasing, Punctuation Removal, etc.)
- Stemming & Lemmatization (via NLTK or spaCy)
- Vectorization (TF-IDF or Count Vectorizer)
- Dependency Management (Auto-installs missing libraries)
- Support for 2D Text Arrays (Processes lists of lists of text)
- Exception-Free Execution (Handles API changes without breaking)
Features
- Automated dependency installation
- Works with both NLTK and spaCy
- Vectorization support using scikit-learn
- Handles single strings and 2D arrays
- No human intervention required
Installation
Run the following command to install missing dependencies:
pip install pun_nlp
Usage
Import and Initialize
from pun_nlp import NLPProcessor
processor = NLPProcessor(stem=True, lemmatize=True, vectorize="tfidf", backend="spacy")
Process a Single Text
output = processor.process("running jumped swimming")
print(output)
Process a 2D Array of Text
input_texts = [
["I am running", "He is jumping"],
["They are swimming", "Dogs are barking"]
]
output = processor.process(input_texts)
print(output)
Customization Options
| Parameter | Description |
|---|---|
stem |
Enable stemming (default: False) |
lemmatize |
Enable lemmatization (default: False) |
vectorize |
Choose "tfidf", "count", or None (default: None) |
tokenize |
Enable word/sentence tokenization (default: False) |
remove_stopwords |
Remove stopwords (default: False) |
pos_tagging |
Enable Part-of-Speech tagging (default: False) |
ner |
Enable Named Entity Recognition (default: False) |
normalize |
Lowercase and remove punctuation (default: False) |
backend |
Choose "nltk" or "spacy" (default: "nltk") |
Check Supported Vectorizers
print(NLPProcessor.supported_vectorizers()) # ['tfidf', 'count']
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pun_nlp-0.0.6.tar.gz
(4.1 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pun_nlp-0.0.6.tar.gz.
File metadata
- Download URL: pun_nlp-0.0.6.tar.gz
- Upload date:
- Size: 4.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4d31aac6b3dc20943b24711e1f60159f8634c740419ee5984edc83f0378e5d6b
|
|
| MD5 |
68a786bc8a5ed0c7ce1758dc6e42757f
|
|
| BLAKE2b-256 |
e60bafef877c6383be32f2d51e194afe8cfb988b219bb4ddd87fc23fa3b1f4a3
|
File details
Details for the file pun_nlp-0.0.6-py3-none-any.whl.
File metadata
- Download URL: pun_nlp-0.0.6-py3-none-any.whl
- Upload date:
- Size: 4.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0af0f1e5dd9d2e1a2d824dd463b34d2ecbb3103161c81eedb7f46e8e83b3fdf1
|
|
| MD5 |
2da9aa68f7231eb0e52c5e1ac9c9d953
|
|
| BLAKE2b-256 |
45878a47e7dbd54f48e6fb497c2f2587b4dc922f9215b5f4222d31fa6f0356b6
|