Skip to main content

A robust NLP pipeline for stemming, lemmatization, and vectorization

Project description

NLPProcessor

Overview

NLPProcessor is an automated, adaptive NLP pipeline that dynamically handles:

  • Stemming & Lemmatization (via NLTK or spaCy)
  • Vectorization (TF-IDF or Count Vectorizer)
  • Dependency Management (Auto-installs missing libraries)
  • Support for 2D Text Arrays (Processes lists of lists of text)
  • Exception-Free Execution (Handles API changes without breaking)

Features

  • Automated dependency installation
  • Works with both NLTK and spaCy
  • Vectorization support using scikit-learn
  • Handles single strings and 2D arrays
  • No human intervention required

Installation

Run the following command to install missing dependencies:

python your_script.py

Usage

Import and Initialize

from your_script import NLPProcessor

processor = NLPProcessor(stem=True, lemmatize=True, vectorize="tfidf", backend="spacy")

Process a Single Text

output = processor.process("running jumped swimming")
print(output)

Process a 2D Array of Text

input_texts = [
    ["I am running", "He is jumping"],
    ["They are swimming", "Dogs are barking"]
]
output = processor.process(input_texts)
print(output)

Customization Options

Parameter Description
stem Enable stemming (default: False)
lemmatize Enable lemmatization (default: False)
vectorize Choose "tfidf", "count", or None (default: None)
backend Choose "nltk" or "spacy" (default: "nltk")

Check Supported Vectorizers

print(NLPProcessor.supported_vectorizers())  # ['tfidf', 'count']

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pun_nlp-0.0.3.tar.gz (3.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pun_nlp-0.0.3-py3-none-any.whl (4.3 kB view details)

Uploaded Python 3

File details

Details for the file pun_nlp-0.0.3.tar.gz.

File metadata

  • Download URL: pun_nlp-0.0.3.tar.gz
  • Upload date:
  • Size: 3.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for pun_nlp-0.0.3.tar.gz
Algorithm Hash digest
SHA256 719c1e8b40fa6dd161f31e883e2070b2eb03ab290679638c9b6f96c2a79a0e1e
MD5 90f8177c0dd3c001246ae27975448481
BLAKE2b-256 a2dc3eac5f54cdee4bee658cf920aa77203f407b5dce4eb77575e7c6018ddb77

See more details on using hashes here.

File details

Details for the file pun_nlp-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: pun_nlp-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 4.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for pun_nlp-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 39f4ef762515565b983db2a8428e5647bd5bed798c8d58a3ca43c90d05123706
MD5 97afc1819bd4bcbd6c503215e6cbe74f
BLAKE2b-256 b78bcca0607f2451cdd53e09b7bd317ccf61eac455eff0c43b8c9776ab6cc2b8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page