A comprehensive NLP and Machine Learning package with example implementations
Project description
PKMB Package
A comprehensive Python package containing various NLP and Machine Learning implementations.
Installation
pip install pkmb
Usage
from pkmb import print_program
# Print any program (1-5, 7, 9)
print_program(1) # Basic NLP operations
print_program(2) # Named Entity Recognition
print_program(3) # TF-IDF implementation
print_program(4) # N-grams analysis
print_program(5) # Word Embeddings analysis
print_program(7) # Text Generation with LSTM
print_program(9) # Variational Autoencoder for MNIST
Available Programs
-
Program 1: Natural Language Processing (NLP) Text Analysis
- Basic NLP operations using NLTK
- Includes: tokenization, stopword removal, stemming, and lemmatization
- Demonstrates both sentence and word-level processing
-
Program 2: Named Entity Recognition (NER)
- Uses NLTK for entity extraction
- Identifies persons, organizations, locations
- Includes BIO tagging and tree representation
-
Program 3: TF-IDF Implementation
- Manual implementation of TF-IDF calculation
- Comparison with scikit-learn's TfidfVectorizer
- Document similarity analysis
-
Program 4: N-grams Analysis
- Uses Pride and Prejudice as corpus
- Generates unigrams, bigrams, and trigrams
- Includes frequency analysis and visualization
-
Program 5: Word Embeddings Analysis
- Uses GloVe embeddings (50d)
- Word similarity computation
- Semantic relationship analysis
-
Program 7: Text Generation with LSTM
- Neural network-based text generation
- Uses TensorFlow/Keras LSTM architecture
- Includes training and text generation capabilities
-
Program 9: Variational Autoencoder (VAE)
- Deep learning model for MNIST dataset
- Implements both encoder and decoder networks
- Generates new digit images from latent space
Note: Programs 6 and 8 are intentionally omitted from this collection.
Dependencies
The package requires the following Python packages:
pip install nltk pandas scikit-learn requests gensim scipy==1.11.4 tensorflow matplotlib numpy
Additional Setup
-
NLTK Data: Required for Programs 1-4
- Downloads automatically when running the programs
- Includes: punkt, stopwords, wordnet, averaged_perceptron_tagger, maxent_ne_chunker
-
GloVe Embeddings: Required for Program 5
- Downloads automatically on first use (~66MB)
- Uses the glove-wiki-gigaword-50 model
-
MNIST Dataset: Required for Program 9
- Downloads automatically through TensorFlow
- Used for training and testing the VAE
Note on GPU Support
Programs 7 (LSTM) and 9 (VAE) can benefit from GPU acceleration if TensorFlow is installed with CUDA support.
Error Handling
All programs include proper error handling and will display informative messages if:
- Required data is not available
- Words are not found in vocabulary
- Models fail to load or process
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pkmb-0.1.2.tar.gz.
File metadata
- Download URL: pkmb-0.1.2.tar.gz
- Upload date:
- Size: 11.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8780b0319d9fb091194bafe2d9ddefdd177c9618dbf63315c6c8b5674ea3966b
|
|
| MD5 |
56f874d752dd14311ab14332f1d9035d
|
|
| BLAKE2b-256 |
efe37f3bad0017690e92d30946621c0ee757c14229a400fc9d1aa57a32b7aebe
|
File details
Details for the file pkmb-0.1.2-py3-none-any.whl.
File metadata
- Download URL: pkmb-0.1.2-py3-none-any.whl
- Upload date:
- Size: 10.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
17b6958567f667d86aa9e7ac4843702d2a58324305522efd23a7b3e9f5cd247b
|
|
| MD5 |
646b5ef44ff5811146f934fae5ad527f
|
|
| BLAKE2b-256 |
47d9c6801a084b2dfa0710045a0694487a0fde2cb8a7aa89c59dab01e8526bdb
|