Message Structurer Package
Project description
TakeBlipMessageStructurer Package
Data & Analytics Research
Overview
Message Structurer is an AI model capable of assisting in structuring text messages.
For each message sent, a list is obtained with the main elements found in the analyzed sentence.
The elements found can be more than one word and have the following components:
- value: sequence of characters found in the sentence corresponding to the element
- lowercase: is the value found previously in lower case
- postags: element grammar class
- type: type of element found (class of entity found or postagging)
Here are presented these content:
Run
To run the Message Structurer is possible in two ways: for a single sentence e for a batch of sentences.
Single Sentence
To predict a single sentence, the method predict_line should be used. Example of initialization e usage:
- Import main packages;
- Initialize model variables;
- Read PosTagging, NER model and embedding model;
- Initialize and usage.
An example of the above steps could be found in the python code below:
- Import main packages:
import json
import torch
from TakeBlipNer.predict import NerPredict
from TakeBlipPosTagger.predict import PosTaggerPredict
from TakeBlipMessageStructurer.utils import load_fasttext_embeddings
from TakeBlipMessageStructurer.predict.messagestructurer import MessageStructurer
- Initialize model variables:
In order to predict the sentences tags, the following variables should be created:
- postag_model_path: string with the path of PosTagging pickle model;
- postag_label_path: string with the path of PosTagging pickle labels;
- ner_model_path: string with the path of NER pickle model;
- ner_label_path: string with the path of NER pickle labels;
- wordembed_path: string with FastText embedding files;
- padding_string: string which represents the pad token;
- unknown_string: a string which represents unknown token;
- sentence: string with sentence to be structured.
Example of variables creation:
postag_model_path = '*.pkl'
postag_label_path = '*.pkl'
ner_label_path = '*.pkl'
ner_model_path = '*.pkl'
wordembed_path = '*.kv'
padding_string = '<pad>'
unk_string = '<unk>'
sentence = 'SENTENCE EXAMPLE TO PREDICT'
- Read Embedding, PosTagging and NER model:
embedding_model = load_fasttext_embeddings(embedding_path, pad_string)
postagging_model = torch.load(postag_model_path)
postag_predicter = PosTaggerPredict(
model=postagging_model,
label_path=postag_label_path,
embedding=embedding_model)
ner_model = torch.load(ner_model_path)
ner_predicter = NerPredict(
pad_string=pad_string,
unk_string=unk_string,
model=ner_model,
postag_model=postag_predicter,
label_path=ner_label_path)
- Initialize tags to be removed, Message Structurer and usage:
tags = ['INT', 'ART', 'PRON', 'SIMB', 'PON', 'CONJ']
message_structurer = MessageStructurer(ner_model=ner_predicter)
print(message_structurer.structure_message(sentence, tags))
Batch
To predict a single sentence, the method predict_line should be used. Example of initialization e usage:
- Import main packages;
- Initialize model variables;
- Read PosTagging, NER model and embedding model;
- Read file to be structured;
- Initialize and usage;
- Package usage.
An example of the above steps could be found in the python code below:
- Import main packages:
import json
import torch
from TakeBlipNer.predict import NerPredict
from TakeBlipPosTagger.predict import PosTaggerPredict
from TakeBlipMessageStructurer.utils import load_fasttext_embeddings
from TakeBlipMessageStructurer.predict.messagestructurer import MessageStructurer
- Initialize model variables:
In order to predict the sentences tags, the following variables should be created:
- postag_model_path: string with the path of PosTagging pickle model;
- postag_label_path: string with the path of PosTagging pickle labels;
- ner_model_path: string with the path of NER pickle model;
- ner_label_path: string with the path of NER pickle labels;
- wordembed_path: string with FastText embedding files;
- padding_string: string which represents the pad token;
- unknown_string: a string which represents unknown token.
Example of variables creation:
postag_model_path = '*.pkl'
postag_label_path = '*.pkl'
ner_label_path = '*.pkl'
ner_model_path = '*.pkl'
wordembed_path = '*.kv'
padding_string = '<pad>'
unk_string = '<unk>'
- Read Embedding, PosTagging and NER model:
embedding_model = load_fasttext_embeddings(embedding_path, pad_string)
postagging_model = torch.load(postag_model_path)
postag_predicter = PosTaggerPredict(
model=postagging_model,
label_path=postag_label_path,
embedding=embedding_model)
ner_model = torch.load(ner_model_path)
ner_predicter = NerPredict(
pad_string=pad_string,
unk_string=unk_string,
model=ner_model,
postag_model=postag_predicter,
label_path=ner_label_path)
- Read file to be structured:
- In order to predict a batch, will need a json file as follows:
{
"sentences": [
{
"id": 1,
"sentence": "sentence_1"
},
{
"id": 2,
"sentence": "sentence_2"
}
]
}
- Reading json file:
file = open(path_sentences)
sentence = json.load(file)['Sentences']
- Initialize tags to be removed and Message Structurer:
tags = ['INT', 'ART', 'PRON', 'SIMB', 'PON', 'CONJ']
message_structurer = MessageStructurer(ner_model=ner_predicter)
- Package usage
- In order to use the package, some variables should be initialized:
- input_path: a string with path of the .csv file;
- batch_size: number of sentences which will be predicted at the same time;
- shuffle: a boolean representing if the dataset is shuffled;
- use_pre_processing: a boolean indicating if sentence will be preprocessed;
Example of variable creations:
path_sentences = '*.json'
batch_size = 64
shuffle = True
use_pre_processing = True
- Structuring a batch of sentences:
print(messagestructurer.structure_message_batch(
batch_size=batch_size,
shuffle=shuffle,
use_pre_processing=use_pre_processing,
sentences=sentence,
tags_to_remove=tags))
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for TakeBlipMessageStructurer-0.0.1b0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 31c904f984fe9b1fa9b230d753cb7e1ab28d88a807754f0daa3b6354290512ec |
|
MD5 | a901af603b36fa6d3a8180a66ef29d6b |
|
BLAKE2b-256 | dc44b07ceaa56a82ff5e7820d4f2e86f216909e97dabddbccdc51b3aa3b0afbf |
Hashes for TakeBlipMessageStructurer-0.0.1b0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 713dab0aa3fde8314fc44555483b2b8b16756ecf31a8ec1a65e5fe6ca83543f2 |
|
MD5 | 9e958c0cf081ebda859010a9afe4d6ac |
|
BLAKE2b-256 | 8a185d8e9c574fdfe47aa7e0700d2b9971eb8299266efe2c13e3d66a0b5b812b |