A simple NLP utility library
Project description
HubNLP Documentation
Introduction
HubNLP is a simple, user-friendly NLP utility library designed to make NLP tasks easier and more accessible. It includes functions for Named Entity Recognition (NER), word feature extraction, attention-based models, and more.
Installation
You can install HubNLP via pip:
pip install HubNLP
Functions and Features
1. print_developer_info()
Prints the information about the developer of the library.
Usage:
from HubNLP import print_developer_info
print_developer_info()
Output:
Developer: Self-nasu
Library: HubNLP
Email: nexiotech.2024@gmail.com
2. print_library_tagline()
Prints the tagline or description of the library.
Usage:
from HubNLP import print_library_tagline
print_library_tagline()
Output:
HubNLP - A simple NLP utility library
3. load_ner_data(file_path)
Loads NER data in CoNLL format (which includes words, part-of-speech tags, and NER tags).
Arguments:
file_path(str): The path to the CoNLL format text file.
Returns:
A list of sentences, where each sentence is represented as a list of tuples (word, pos, ner).
Usage:
from HubNLP import load_ner_data
file_path = "path/to/conll_file.txt"
sentences = load_ner_data(file_path)
print(sentences)
Example Output:
[[("John", "NNP", "B-PER"), ("Smith", "NNP", "I-PER")],
["London", "NNP", "B-LOC"]]
4. extract_word_features(sentence, index)
Extracts features for a specific word in a sentence for NLP tasks (e.g., NER or POS tagging).
Arguments:
sentence(list): A list of tuples representing a sentence, where each tuple contains(word, POS, [optional] NER tag).index(int): The index of the word in the sentence for which features are to be extracted.
Returns:
A dictionary containing features for the word.
Usage:
from HubNLP import extract_word_features
sentence = [("John", "NNP"), ("is", "VBZ"), ("running", "VBG")]
features = extract_word_features(sentence, 2)
print(features)
Example Output:
{
'word': 'running',
'postag': 'VBG',
'is_upper': False,
'is_title': False,
'is_digit': False,
'-1:word': 'is',
'-1:postag': 'VBZ',
'BOS': False,
'+1:word': '',
'+1:postag': '',
'EOS': True
}
5. AttentionLayer Class
Defines a custom attention layer that applies a Bi-directional LSTM followed by an attention mechanism.
Usage:
from HubNLP import AttentionLayer
import tensorflow as tf
from tensorflow.keras.layers import Input, Embedding, Dense, Bidirectional, LSTM
# Example to use AttentionLayer
class CustomModel(tf.keras.Model):
def __init__(self, vocab_size, embedding_dim, max_seq_len, lstm_units):
super(CustomModel, self).__init__()
self.embedding = Embedding(input_dim=vocab_size, output_dim=embedding_dim, input_length=max_seq_len)
self.attention_layer = AttentionLayer(lstm_units)
self.dense = Dense(64, activation='relu')
self.output_layer = Dense(4, activation='softmax') # For 4-class classification
def call(self, inputs):
review_input, aspect_input = inputs
review_embedded = self.embedding(review_input)
aspect_embedded = self.embedding(aspect_input)
context_vector = self.attention_layer([review_embedded, aspect_embedded])
x = self.dense(context_vector)
return self.output_layer(x)
6. build_attention_model(vocab_size, embedding_dim, max_seq_len, lstm_units)
Builds a model architecture with an attention mechanism for tasks like aspect-based sentiment analysis.
Arguments:
vocab_size(int): The size of the vocabulary.embedding_dim(int): The dimensionality of the embedding layer.max_seq_len(int): The maximum sequence length for input.lstm_units(int): The number of units in the Bi-directional LSTM layer.
Returns:
A compiled Keras model.
Usage:
from HubNLP import build_attention_model
vocab_size = 5000 # Size of the vocabulary
embedding_dim = 128 # Dimensionality of embeddings
max_seq_len = 100 # Max sequence length
lstm_units = 64 # Number of LSTM units
model = build_attention_model(vocab_size, embedding_dim, max_seq_len, lstm_units)
# Compile the model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
# Print model summary
model.summary()
Additional Notes
- The library is designed to work with TensorFlow and Keras for neural network-based models, and the functions can be used independently or together for various NLP tasks.
- If you're using the attention-based models, make sure that you have TensorFlow installed in your environment:
pip install tensorflow
Conclusion
HubNLP provides a range of utilities that make it easy to handle tasks like NER, feature extraction, and building attention-based models for sentiment analysis. With its simple API, you can integrate these functionalities into your own projects with minimal effort.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file HubNLP-0.3.tar.gz.
File metadata
- Download URL: HubNLP-0.3.tar.gz
- Upload date:
- Size: 19.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
16b13741f1a3885a4c7ed528cd08753c564856eaf4fdb63f2356397a864dacaf
|
|
| MD5 |
d6265ccc2830756bf33d18a1e9b46210
|
|
| BLAKE2b-256 |
cc0fa001696d41944b9e53e2ec944938b69f2db555ce31c8363a6e98af357472
|
File details
Details for the file HubNLP-0.3-py3-none-any.whl.
File metadata
- Download URL: HubNLP-0.3-py3-none-any.whl
- Upload date:
- Size: 18.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d9066f3ac9c8ada9873fef02bc2c2b14a9e2a472c097ff08f409a7c7f81f3edf
|
|
| MD5 |
3597642a90c550facaebadc8507a77d2
|
|
| BLAKE2b-256 |
53b3437c7458762b75d4403c149d7093ca9ca18feb7c60515d2015c48c2da3cf
|