A robust NLP pipeline for stemming, lemmatization, and vectorization
Project description
NLPProcessor
Overview
NLPProcessor is an automated, adaptive NLP pipeline that dynamically handles:
- Stemming & Lemmatization (via NLTK or spaCy)
- Vectorization (TF-IDF or Count Vectorizer)
- Dependency Management (Auto-installs missing libraries)
- Support for 2D Text Arrays (Processes lists of lists of text)
- Exception-Free Execution (Handles API changes without breaking)
Features
- Automated dependency installation
- Works with both NLTK and spaCy
- Vectorization support using scikit-learn
- Handles single strings and 2D arrays
- No human intervention required
Installation
Run the following command to install missing dependencies:
python your_script.py
Usage
Import and Initialize
from your_script import NLPProcessor
processor = NLPProcessor(stem=True, lemmatize=True, vectorize="tfidf", backend="spacy")
Process a Single Text
output = processor.process("running jumped swimming")
print(output)
Process a 2D Array of Text
input_texts = [
["I am running", "He is jumping"],
["They are swimming", "Dogs are barking"]
]
output = processor.process(input_texts)
print(output)
Customization Options
| Parameter | Description |
|---|---|
stem |
Enable stemming (default: False) |
lemmatize |
Enable lemmatization (default: False) |
vectorize |
Choose "tfidf", "count", or None (default: None) |
backend |
Choose "nltk" or "spacy" (default: "nltk") |
Check Supported Vectorizers
print(NLPProcessor.supported_vectorizers()) # ['tfidf', 'count']
License
MIT License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pun_nlp-0.0.3.tar.gz
(3.9 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pun_nlp-0.0.3.tar.gz.
File metadata
- Download URL: pun_nlp-0.0.3.tar.gz
- Upload date:
- Size: 3.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
719c1e8b40fa6dd161f31e883e2070b2eb03ab290679638c9b6f96c2a79a0e1e
|
|
| MD5 |
90f8177c0dd3c001246ae27975448481
|
|
| BLAKE2b-256 |
a2dc3eac5f54cdee4bee658cf920aa77203f407b5dce4eb77575e7c6018ddb77
|
File details
Details for the file pun_nlp-0.0.3-py3-none-any.whl.
File metadata
- Download URL: pun_nlp-0.0.3-py3-none-any.whl
- Upload date:
- Size: 4.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
39f4ef762515565b983db2a8428e5647bd5bed798c8d58a3ca43c90d05123706
|
|
| MD5 |
97afc1819bd4bcbd6c503215e6cbe74f
|
|
| BLAKE2b-256 |
b78bcca0607f2451cdd53e09b7bd317ccf61eac455eff0c43b8c9776ab6cc2b8
|