Message Structurer Package
Project description
TakeBlipMessageStructurer Package
Data & Analytics Research
Overview
Message Structurer is an AI model capable of assisting in structuring text messages.
For each message sent, a list is obtained with the main elements found in the analyzed sentence.
The elements found can be more than one word and have the following components:
- value: sequence of characters found in the sentence corresponding to the element
- lowercase: is the value found previously in lower case
- postags: element grammar class
- type: type of element found (class of entity found or postagging)
Here are presented these content:
Run
To run the Message Structurer is possible in two ways: for a single sentence e for a batch of sentences.
Single Sentence
To predict a single sentence, the method predict_line should be used. Example of initialization e usage:
- Import main packages;
- Initialize model variables;
- Read PosTagging, NER model and embedding model;
- Initialize and usage.
An example of the above steps could be found in the python code below:
- Import main packages:
import json
import torch
from TakeBlipNer.predict import NerPredict
from TakeBlipPosTagger.predict import PosTaggerPredict
from TakeBlipMessageStructurer.utils import load_fasttext_embeddings
from TakeBlipMessageStructurer.predict.messagestructurer import MessageStructurer
- Initialize model variables:
In order to predict the sentences tags, the following variables should be created:
- postag_model_path: string with the path of PosTagging pickle model;
- postag_label_path: string with the path of PosTagging pickle labels;
- ner_model_path: string with the path of NER pickle model;
- ner_label_path: string with the path of NER pickle labels;
- wordembed_path: string with FastText embedding files;
- padding_string: string which represents the pad token;
- unknown_string: a string which represents unknown token;
- sentence: string with sentence to be structured.
Example of variables creation:
postag_model_path = '*.pkl'
postag_label_path = '*.pkl'
ner_label_path = '*.pkl'
ner_model_path = '*.pkl'
wordembed_path = '*.kv'
padding_string = '<pad>'
unk_string = '<unk>'
sentence = 'SENTENCE EXAMPLE TO PREDICT'
- Read Embedding, PosTagging and NER model:
embedding_model = load_fasttext_embeddings(embedding_path, pad_string)
postagging_model = torch.load(postag_model_path)
postag_predicter = PosTaggerPredict(
model=postagging_model,
label_path=postag_label_path,
embedding=embedding_model)
ner_model = torch.load(ner_model_path)
ner_predicter = NerPredict(
pad_string=pad_string,
unk_string=unk_string,
model=ner_model,
postag_model=postag_predicter,
label_path=ner_label_path)
- Initialize tags to be removed, Message Structurer and usage:
tags = ['INT', 'ART', 'PRON', 'SIMB', 'PON', 'CONJ']
message_structurer = MessageStructurer(ner_model=ner_predicter)
print(message_structurer.structure_message(sentence, tags))
Batch
To predict a single sentence, the method predict_line should be used. Example of initialization e usage:
- Import main packages;
- Initialize model variables;
- Read PosTagging, NER model and embedding model;
- Read file to be structured;
- Initialize and usage;
- Package usage.
An example of the above steps could be found in the python code below:
- Import main packages:
import json
import torch
from TakeBlipNer.predict import NerPredict
from TakeBlipPosTagger.predict import PosTaggerPredict
from TakeBlipMessageStructurer.utils import load_fasttext_embeddings
from TakeBlipMessageStructurer.predict.messagestructurer import MessageStructurer
- Initialize model variables:
In order to predict the sentences tags, the following variables should be created:
- postag_model_path: string with the path of PosTagging pickle model;
- postag_label_path: string with the path of PosTagging pickle labels;
- ner_model_path: string with the path of NER pickle model;
- ner_label_path: string with the path of NER pickle labels;
- wordembed_path: string with FastText embedding files;
- padding_string: string which represents the pad token;
- unknown_string: a string which represents unknown token.
Example of variables creation:
postag_model_path = '*.pkl'
postag_label_path = '*.pkl'
ner_label_path = '*.pkl'
ner_model_path = '*.pkl'
wordembed_path = '*.kv'
padding_string = '<pad>'
unk_string = '<unk>'
- Read Embedding, PosTagging and NER model:
embedding_model = load_fasttext_embeddings(embedding_path, pad_string)
postagging_model = torch.load(postag_model_path)
postag_predicter = PosTaggerPredict(
model=postagging_model,
label_path=postag_label_path,
embedding=embedding_model)
ner_model = torch.load(ner_model_path)
ner_predicter = NerPredict(
pad_string=pad_string,
unk_string=unk_string,
model=ner_model,
postag_model=postag_predicter,
label_path=ner_label_path)
- Read file to be structured:
- In order to predict a batch, will need a json file as follows:
{
"sentences": [
{
"id": 1,
"sentence": "sentence_1"
},
{
"id": 2,
"sentence": "sentence_2"
}
]
}
- Reading json file:
file = open(path_sentences)
sentence = json.load(file)['Sentences']
- Initialize tags to be removed and Message Structurer:
tags = ['INT', 'ART', 'PRON', 'SIMB', 'PON', 'CONJ']
message_structurer = MessageStructurer(ner_model=ner_predicter)
- Package usage
- In order to use the package, some variables should be initialized:
- input_path: a string with path of the .csv file;
- batch_size: number of sentences which will be predicted at the same time;
- shuffle: a boolean representing if the dataset is shuffled;
- use_pre_processing: a boolean indicating if sentence will be preprocessed;
Example of variable creations:
path_sentences = '*.json'
batch_size = 64
shuffle = True
use_pre_processing = True
- Structuring a batch of sentences:
print(messagestructurer.structure_message_batch(
batch_size=batch_size,
shuffle=shuffle,
use_pre_processing=use_pre_processing,
sentences=sentence,
tags_to_remove=tags))
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file TakeBlipMessageStructurer-0.0.1.tar.gz
.
File metadata
- Download URL: TakeBlipMessageStructurer-0.0.1.tar.gz
- Upload date:
- Size: 10.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.6.4 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.0 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 806cce4293522695504735839757ad5cdd05183e929a77ca6d4bef2032a05824 |
|
MD5 | 4ae2cddd0e8bb147d789950a97eab563 |
|
BLAKE2b-256 | 498f9b5d76eb9f60a6a1df2ec54cd709cc0a86ad62554243c99ee4b1f2021b78 |
File details
Details for the file TakeBlipMessageStructurer-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: TakeBlipMessageStructurer-0.0.1-py3-none-any.whl
- Upload date:
- Size: 10.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.6.4 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.0 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 985729d726d32789ad6b555a24004cb08a89ca9f37c98f66be3927ae6d6d600e |
|
MD5 | 5f85abf9048fc46ad54c53c49ea339a5 |
|
BLAKE2b-256 | 00f8b47937d5239244d3242254f10726af18ab60c9f1a80751d057de2d9e9279 |