Skip to main content

Message Structurer Package

Project description

TakeBlipMessageStructurer Package

Data & Analytics Research

Overview

Message Structurer is an AI model capable of assisting in structuring text messages.

For each message sent, a list is obtained with the main elements found in the analyzed sentence.

The elements found can be more than one word and have the following components:

  • value: sequence of characters found in the sentence corresponding to the element
  • lowercase: is the value found previously in lower case
  • postags: element grammar class
  • type: type of element found (class of entity found or postagging)

Here are presented these content:

Run

To run the Message Structurer is possible in two ways: for a single sentence e for a batch of sentences.

Single Sentence

To predict a single sentence, the method predict_line should be used. Example of initialization e usage:

  1. Import main packages;
  2. Initialize model variables;
  3. Read PosTagging, NER model and embedding model;
  4. Initialize and usage.

An example of the above steps could be found in the python code below:

  1. Import main packages:
import json
import torch

from TakeBlipNer.predict import NerPredict
from TakeBlipPosTagger.predict import PosTaggerPredict
from TakeBlipMessageStructurer.utils import load_fasttext_embeddings
from TakeBlipMessageStructurer.predict.messagestructurer import MessageStructurer
  1. Initialize model variables:

In order to predict the sentences tags, the following variables should be created:

  • postag_model_path: string with the path of PosTagging pickle model;
  • postag_label_path: string with the path of PosTagging pickle labels;
  • ner_model_path: string with the path of NER pickle model;
  • ner_label_path: string with the path of NER pickle labels;
  • wordembed_path: string with FastText embedding files;
  • padding_string: string which represents the pad token;
  • unknown_string: a string which represents unknown token;
  • sentence: string with sentence to be structured.

Example of variables creation:

postag_model_path = '*.pkl'
postag_label_path = '*.pkl'
ner_label_path = '*.pkl'
ner_model_path = '*.pkl'
wordembed_path = '*.kv'
padding_string = '<pad>'
unk_string = '<unk>'
sentence = 'SENTENCE EXAMPLE TO PREDICT'
  1. Read Embedding, PosTagging and NER model:
embedding_model = load_fasttext_embeddings(embedding_path, pad_string)

postagging_model = torch.load(postag_model_path)
postag_predicter = PosTaggerPredict(
    model=postagging_model,
    label_path=postag_label_path,
    embedding=embedding_model)

ner_model = torch.load(ner_model_path)
ner_predicter = NerPredict(
    pad_string=pad_string,
    unk_string=unk_string,
    model=ner_model,
    postag_model=postag_predicter,
    label_path=ner_label_path)
  1. Initialize tags to be removed, Message Structurer and usage:
tags = ['INT', 'ART', 'PRON', 'SIMB', 'PON', 'CONJ']
message_structurer = MessageStructurer(ner_model=ner_predicter)

print(message_structurer.structure_message(sentence, tags))

Batch

To predict a single sentence, the method predict_line should be used. Example of initialization e usage:

  1. Import main packages;
  2. Initialize model variables;
  3. Read PosTagging, NER model and embedding model;
  4. Read file to be structured;
  5. Initialize and usage;
  6. Package usage.

An example of the above steps could be found in the python code below:

  1. Import main packages:
import json
import torch

from TakeBlipNer.predict import NerPredict
from TakeBlipPosTagger.predict import PosTaggerPredict
from TakeBlipMessageStructurer.utils import load_fasttext_embeddings
from TakeBlipMessageStructurer.predict.messagestructurer import MessageStructurer
  1. Initialize model variables:

In order to predict the sentences tags, the following variables should be created:

  • postag_model_path: string with the path of PosTagging pickle model;
  • postag_label_path: string with the path of PosTagging pickle labels;
  • ner_model_path: string with the path of NER pickle model;
  • ner_label_path: string with the path of NER pickle labels;
  • wordembed_path: string with FastText embedding files;
  • padding_string: string which represents the pad token;
  • unknown_string: a string which represents unknown token.

Example of variables creation:

postag_model_path = '*.pkl'
postag_label_path = '*.pkl'
ner_label_path = '*.pkl'
ner_model_path = '*.pkl'
wordembed_path = '*.kv'
padding_string = '<pad>'
unk_string = '<unk>'
  1. Read Embedding, PosTagging and NER model:
embedding_model = load_fasttext_embeddings(embedding_path, pad_string)

postagging_model = torch.load(postag_model_path)
postag_predicter = PosTaggerPredict(
    model=postagging_model,
    label_path=postag_label_path,
    embedding=embedding_model)

ner_model = torch.load(ner_model_path)
ner_predicter = NerPredict(
    pad_string=pad_string,
    unk_string=unk_string,
    model=ner_model,
    postag_model=postag_predicter,
    label_path=ner_label_path)
  1. Read file to be structured:
  • In order to predict a batch, will need a json file as follows:
{
    "sentences": [
                    {
                        "id": 1, 
                        "sentence": "sentence_1"
                    }, 
                    {
                        "id": 2, 
                        "sentence": "sentence_2"
                    }
                ]
}
  • Reading json file:
file = open(path_sentences)
sentence = json.load(file)['Sentences']
  1. Initialize tags to be removed and Message Structurer:
tags = ['INT', 'ART', 'PRON', 'SIMB', 'PON', 'CONJ']
message_structurer = MessageStructurer(ner_model=ner_predicter)
  1. Package usage
  • In order to use the package, some variables should be initialized:
    • input_path: a string with path of the .csv file;
    • batch_size: number of sentences which will be predicted at the same time;
    • shuffle: a boolean representing if the dataset is shuffled;
    • use_pre_processing: a boolean indicating if sentence will be preprocessed;

Example of variable creations:

path_sentences = '*.json'
batch_size = 64
shuffle = True
use_pre_processing = True
  • Structuring a batch of sentences:
print(messagestructurer.structure_message_batch(
    batch_size=batch_size,
    shuffle=shuffle,
    use_pre_processing=use_pre_processing,
    sentences=sentence,
    tags_to_remove=tags))

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

TakeBlipMessageStructurer-0.0.1.tar.gz (10.8 kB view hashes)

Uploaded Source

Built Distribution

TakeBlipMessageStructurer-0.0.1-py3-none-any.whl (10.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page