A robust NLP pipeline for stemming, lemmatization, and vectorization

These details have not been verified by PyPI

Project links

Homepage

Project description

pun_nlp

Overview

pun_nlp is a robust NLP abstraction layer designed to simplify text processing and vectorization. It handles dependency management, resource downloading, and text preprocessing automatically, so you don't have to write boilerplate code.

It solves common issues with NLTK downloads and path errors by implementing a robust, lazy-loading resource manager that works in restricted environments like Kaggle and corporate servers.

Features

Robust Resource Management: Automatically handles NLTK/Spacy downloads and SSL errors.
Lazy Loading: Resources are only loaded into memory when needed.
Type Safety: Prevents invalid combinations of operations (like vectorizing POS tuples).
Unified API: Process single strings, lists, or 2D arrays of text with one method.
Seamless Vectorization: Integrates directly with Scikit-Learn's TF-IDF and Count vectorizers.

Installation

pip install pun_nlp

Usage

Basic Pipeline

from pun_nlp import NLPProcessor

# Initialize with desired flags
p = NLPProcessor(
    tokenize=True, 
    stem=True, 
    remove_stopwords=True,
    normalize=True
)

text = "The QUICK brown foxes are running fast!"

# Automatically handles downloads and processing
print(p.process(text))
# Output: ['quick', 'brown', 'fox', 'run', 'fast']

NER & POS Tagging

# NER (Case sensitive checking happens before normalization)
p_ner = NLPProcessor(ner=True)
print(p_ner.process("Apple Inc. is hiring in California."))
# Output: [('Apple Inc.', 'ORG'), ('California', 'GPE')]

# POS Tagging (Tags tokens correctly before stemming)
p_pos = NLPProcessor(pos_tagging=True, stem=True)
print(p_pos.process("The boys are likely running."))
# Output: [('the', 'DT'), ('boy', 'NNS'), ('are', 'VBP'), ('like', 'RB'), ('run', 'VBG')]

Vectorization

p_vec = NLPProcessor(vectorize="tfidf", stop_words=True)
corpus = [
    "Machine learning is fascinating.",
    "Natural language processing is a subset of AI."
]

p_vec.fit_vectorizer(corpus)
vectors = p_vec.transform_texts(corpus)
print(vectors.shape)

Configuration

Parameter	Description
`stem`	Enable stemming (PorterStemmer).
`lemmatize`	Enable lemmatization (WordNet).
`vectorize`	"tfidf", "count", or None.
`tokenize`	Force return of token list.
`remove_stopwords`	Remove English stopwords (Case-insensitive).
`pos_tagging`	Return (Word, Tag) tuples.
`ner`	Return Entity tuples (Uses Spacy).
`normalize`	Lowercase & remove punctuation.
`backend`	"nltk" (default) or "spacy".

License

MIT License. See LICENSE for details.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.0.9

Dec 23, 2025

0.0.8

May 12, 2025

0.0.7

May 12, 2025

0.0.6

Mar 17, 2025

0.0.5

Mar 17, 2025

0.0.4

Mar 17, 2025

0.0.3

Mar 17, 2025

0.0.2

Mar 17, 2025

0.0.1

Mar 17, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pun_nlp-0.0.9.tar.gz (7.1 kB view details)

Uploaded Dec 23, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pun_nlp-0.0.9-py3-none-any.whl (7.8 kB view details)

Uploaded Dec 23, 2025 Python 3

File details

Details for the file pun_nlp-0.0.9.tar.gz.

File metadata

Download URL: pun_nlp-0.0.9.tar.gz
Upload date: Dec 23, 2025
Size: 7.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for pun_nlp-0.0.9.tar.gz
Algorithm	Hash digest
SHA256	`2fe5ecf091e7021cef828debf8aaa7597f531208512a78d39bea04d2ecee0d69`
MD5	`a6ea01c1e5f8ea0773d2185bc7f5a5c2`
BLAKE2b-256	`46f148e4a9b2b9e8b66de08dd4436bdd21182dac5d8ca3c3554816a09e3d8af4`

See more details on using hashes here.

File details

Details for the file pun_nlp-0.0.9-py3-none-any.whl.

File metadata

Download URL: pun_nlp-0.0.9-py3-none-any.whl
Upload date: Dec 23, 2025
Size: 7.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for pun_nlp-0.0.9-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1f601e80887f026e28dd7f606ffd1ab80088e31dbba1bb358cb6a2164d60c21a`
MD5	`f5349312f2ce4bfa63355c780b339eb4`
BLAKE2b-256	`1e090c58848666e1e1abf08d340a60461a140e7eec8924c6086dd59b08a57e20`

See more details on using hashes here.

pun-nlp 0.0.9

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

pun_nlp

Overview

Features

Installation

Usage

Basic Pipeline

NER & POS Tagging

Vectorization

Configuration

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes