A robust NLP pipeline for stemming, lemmatization, and vectorization
Project description
NLPProcessor
Overview
NLPProcessor is an automated, adaptive NLP pipeline that dynamically handles:
- Tokenization (Word & Sentence)
- Stopword Removal
- POS Tagging
- Named Entity Recognition (NER)
- Text Normalization (Lowercasing, Punctuation Removal, etc.)
- Stemming & Lemmatization (via NLTK or spaCy)
- Vectorization (TF-IDF or Count Vectorizer)
- Dependency Management (Auto-installs missing libraries.)
- Support for 2D Text Arrays (Processes lists of lists of text.)
- Exception-Free Execution (Handles API changes without breaking.)
Features
- Automated dependency installation
- Works with both NLTK and spaCy
- Vectorization support using scikit-learn
- Handles single strings and 2D arrays
- No human intervention required
Installation
Run the following command to install missing dependencies:
pip install pun_nlp
Usage
Import and Initialize
from pun_nlp import NLPProcessor
processor = NLPProcessor(stem=True, lemmatize=True, vectorize="tfidf", backend="spacy")
Process a Single Text
output = processor.process("running jumped swimming")
print(output)
Process a 2D Array of Text
input_texts = [
["I am running", "He is jumping"],
["They are swimming", "Dogs are barking"]
]
output = processor.process(input_texts)
print(output)
Customization Options
| Parameter | Description |
|---|---|
stem |
Enable stemming (default: False) |
lemmatize |
Enable lemmatization (default: False) |
vectorize |
Choose "tfidf", "count", or None (default: None) |
tokenize |
Enable word/sentence tokenization (default: False) |
remove_stopwords |
Remove stopwords (default: False) |
pos_tagging |
Enable Part-of-Speech tagging (default: False) |
ner |
Enable Named Entity Recognition (default: False) |
normalize |
Lowercase and remove punctuation (default: False) |
backend |
Choose "nltk" or "spacy" (default: "nltk") |
Check Supported Vectorizers
print(NLPProcessor.supported_vectorizers()) # ['tfidf', 'count']
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pun_nlp-0.0.8.tar.gz
(4.2 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pun_nlp-0.0.8.tar.gz.
File metadata
- Download URL: pun_nlp-0.0.8.tar.gz
- Upload date:
- Size: 4.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9ada19b976729631a1dfb9407cc26356eb95fb43d25440cd8062e31875b4f6b4
|
|
| MD5 |
0c46cfb1bcdc74306c059eac70dde77c
|
|
| BLAKE2b-256 |
94ada1add7a33fa338aa2aceb4465c2835775832328ebf74055be30252eb5827
|
File details
Details for the file pun_nlp-0.0.8-py3-none-any.whl.
File metadata
- Download URL: pun_nlp-0.0.8-py3-none-any.whl
- Upload date:
- Size: 4.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8ae1adcaf038b7897c2abd172932f8cd7185198c86145265b499d7208c8520b0
|
|
| MD5 |
4534c181de165e4428f70e364e6f72dd
|
|
| BLAKE2b-256 |
ed35c2d0431ae2bd9d4494a56b13c7468fdcd3ae4a9014d2310317c14a17b6d7
|