A simple NLP utility library

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: GNU General Public License v3 (GPLv3)
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

HubNLP Documentation

Introduction

HubNLP is a simple, user-friendly NLP utility library designed to make NLP tasks easier and more accessible. It includes functions for Named Entity Recognition (NER), word feature extraction, attention-based models, and more.

Installation

You can install HubNLP via pip:

pip install HubNLP

Functions and Features

1. `print_developer_info()`

Prints the information about the developer of the library.

Usage:

from HubNLP import print_developer_info

print_developer_info()

Output:

Developer: Self-nasu
Library: HubNLP
Email: nexiotech.2024@gmail.com

2. `print_library_tagline()`

Prints the tagline or description of the library.

Usage:

from HubNLP import print_library_tagline

print_library_tagline()

Output:

HubNLP - A simple NLP utility library

3. `load_ner_data(file_path)`

Loads NER data in CoNLL format (which includes words, part-of-speech tags, and NER tags).

Arguments:

file_path (str): The path to the CoNLL format text file.

Returns:

A list of sentences, where each sentence is represented as a list of tuples (word, pos, ner).

Usage:

from HubNLP import load_ner_data

file_path = "path/to/conll_file.txt"
sentences = load_ner_data(file_path)

print(sentences)

Example Output:

[[("John", "NNP", "B-PER"), ("Smith", "NNP", "I-PER")],
 ["London", "NNP", "B-LOC"]]

4. `extract_word_features(sentence, index)`

Extracts features for a specific word in a sentence for NLP tasks (e.g., NER or POS tagging).

Arguments:

sentence (list): A list of tuples representing a sentence, where each tuple contains (word, POS, [optional] NER tag).
index (int): The index of the word in the sentence for which features are to be extracted.

Returns:

A dictionary containing features for the word.

Usage:

from HubNLP import extract_word_features

sentence = [("John", "NNP"), ("is", "VBZ"), ("running", "VBG")]
features = extract_word_features(sentence, 2)

print(features)

Example Output:

{
    'word': 'running',
    'postag': 'VBG',
    'is_upper': False,
    'is_title': False,
    'is_digit': False,
    '-1:word': 'is',
    '-1:postag': 'VBZ',
    'BOS': False,
    '+1:word': '',
    '+1:postag': '',
    'EOS': True
}

5. `AttentionLayer` Class

Defines a custom attention layer that applies a Bi-directional LSTM followed by an attention mechanism.

Usage:

from HubNLP import AttentionLayer
import tensorflow as tf
from tensorflow.keras.layers import Input, Embedding, Dense, Bidirectional, LSTM

# Example to use AttentionLayer
class CustomModel(tf.keras.Model):
    def __init__(self, vocab_size, embedding_dim, max_seq_len, lstm_units):
        super(CustomModel, self).__init__()
        self.embedding = Embedding(input_dim=vocab_size, output_dim=embedding_dim, input_length=max_seq_len)
        self.attention_layer = AttentionLayer(lstm_units)
        self.dense = Dense(64, activation='relu')
        self.output_layer = Dense(4, activation='softmax')  # For 4-class classification
    
    def call(self, inputs):
        review_input, aspect_input = inputs
        review_embedded = self.embedding(review_input)
        aspect_embedded = self.embedding(aspect_input)
        context_vector = self.attention_layer([review_embedded, aspect_embedded])
        x = self.dense(context_vector)
        return self.output_layer(x)

6. `build_attention_model(vocab_size, embedding_dim, max_seq_len, lstm_units)`

Builds a model architecture with an attention mechanism for tasks like aspect-based sentiment analysis.

Arguments:

vocab_size (int): The size of the vocabulary.
embedding_dim (int): The dimensionality of the embedding layer.
max_seq_len (int): The maximum sequence length for input.
lstm_units (int): The number of units in the Bi-directional LSTM layer.

Returns:

A compiled Keras model.

Usage:

from HubNLP import build_attention_model

vocab_size = 5000  # Size of the vocabulary
embedding_dim = 128  # Dimensionality of embeddings
max_seq_len = 100  # Max sequence length
lstm_units = 64  # Number of LSTM units

model = build_attention_model(vocab_size, embedding_dim, max_seq_len, lstm_units)

# Compile the model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Print model summary
model.summary()

Additional Notes

The library is designed to work with TensorFlow and Keras for neural network-based models, and the functions can be used independently or together for various NLP tasks.
If you're using the attention-based models, make sure that you have TensorFlow installed in your environment:
```
pip install tensorflow
```

Conclusion

HubNLP provides a range of utilities that make it easy to handle tasks like NER, feature extraction, and building attention-based models for sentiment analysis. With its simple API, you can integrate these functionalities into your own projects with minimal effort.

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: GNU General Public License v3 (GPLv3)
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

0.3

Nov 21, 2024

0.2

Nov 21, 2024

0.1

Nov 21, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

HubNLP-0.3.tar.gz (19.7 kB view details)

Uploaded Nov 21, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

HubNLP-0.3-py3-none-any.whl (18.3 kB view details)

Uploaded Nov 21, 2024 Python 3

File details

Details for the file HubNLP-0.3.tar.gz.

File metadata

Download URL: HubNLP-0.3.tar.gz
Upload date: Nov 21, 2024
Size: 19.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for HubNLP-0.3.tar.gz
Algorithm	Hash digest
SHA256	`16b13741f1a3885a4c7ed528cd08753c564856eaf4fdb63f2356397a864dacaf`
MD5	`d6265ccc2830756bf33d18a1e9b46210`
BLAKE2b-256	`cc0fa001696d41944b9e53e2ec944938b69f2db555ce31c8363a6e98af357472`

See more details on using hashes here.

File details

Details for the file HubNLP-0.3-py3-none-any.whl.

File metadata

Download URL: HubNLP-0.3-py3-none-any.whl
Upload date: Nov 21, 2024
Size: 18.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for HubNLP-0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d9066f3ac9c8ada9873fef02bc2c2b14a9e2a472c097ff08f409a7c7f81f3edf`
MD5	`3597642a90c550facaebadc8507a77d2`
BLAKE2b-256	`53b3437c7458762b75d4403c149d7093ca9ca18feb7c60515d2015c48c2da3cf`

See more details on using hashes here.

HubNLP 0.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

HubNLP Documentation

Introduction

Installation

Functions and Features

1. print_developer_info()

Usage:

Output:

2. print_library_tagline()

Usage:

Output:

3. load_ner_data(file_path)

Arguments:

Returns:

Usage:

Example Output:

4. extract_word_features(sentence, index)

Arguments:

Returns:

Usage:

Example Output:

5. AttentionLayer Class

Usage:

6. build_attention_model(vocab_size, embedding_dim, max_seq_len, lstm_units)

Arguments:

Returns:

Usage:

Additional Notes

Conclusion

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

1. `print_developer_info()`

2. `print_library_tagline()`

3. `load_ner_data(file_path)`

4. `extract_word_features(sentence, index)`

5. `AttentionLayer` Class

6. `build_attention_model(vocab_size, embedding_dim, max_seq_len, lstm_units)`