Skip to main content

Message Structurer Package

Project description

TakeBlipMessageStructurer Package

Data & Analytics Research

Overview

Message Structurer is an AI model capable of assisting in structuring text messages.

For each message sent, a list is obtained with the main elements found in the analyzed sentence.

The elements found can be more than one word and have the following components:

  • value: sequence of characters found in the sentence corresponding to the element
  • lowercase: is the value found previously in lower case
  • postags: element grammar class
  • type: type of element found (class of entity found or postagging)

Here are presented these content:

Run

To run the Message Structurer is possible in two ways: for a single sentence e for a batch of sentences.

Single Sentence

To predict a single sentence, the method predict_line should be used. Example of initialization e usage:

  1. Import main packages;
  2. Initialize model variables;
  3. Read PosTagging, NER model and embedding model;
  4. Initialize and usage.

An example of the above steps could be found in the python code below:

  1. Import main packages:
import json
import torch

from TakeBlipNer.predict import NerPredict
from TakeBlipPosTagger.predict import PosTaggerPredict
from TakeBlipMessageStructurer.utils import load_fasttext_embeddings
from TakeBlipMessageStructurer.predict.messagestructurer import MessageStructurer
  1. Initialize model variables:

In order to predict the sentences tags, the following variables should be created:

  • postag_model_path: string with the path of PosTagging pickle model;
  • postag_label_path: string with the path of PosTagging pickle labels;
  • ner_model_path: string with the path of NER pickle model;
  • ner_label_path: string with the path of NER pickle labels;
  • wordembed_path: string with FastText embedding files;
  • padding_string: string which represents the pad token;
  • unknown_string: a string which represents unknown token;
  • sentence: string with sentence to be structured.

Example of variables creation:

postag_model_path = '*.pkl'
postag_label_path = '*.pkl'
ner_label_path = '*.pkl'
ner_model_path = '*.pkl'
wordembed_path = '*.kv'
padding_string = '<pad>'
unk_string = '<unk>'
sentence = 'SENTENCE EXAMPLE TO PREDICT'
  1. Read Embedding, PosTagging and NER model:
embedding_model = load_fasttext_embeddings(embedding_path, pad_string)

postagging_model = torch.load(postag_model_path)
postag_predicter = PosTaggerPredict(
    model=postagging_model,
    label_path=postag_label_path,
    embedding=embedding_model)

ner_model = torch.load(ner_model_path)
ner_predicter = NerPredict(
    pad_string=pad_string,
    unk_string=unk_string,
    model=ner_model,
    postag_model=postag_predicter,
    label_path=ner_label_path)
  1. Initialize tags to be removed, Message Structurer and usage:
tags = ['INT', 'ART', 'PRON', 'SIMB', 'PON', 'CONJ']
message_structurer = MessageStructurer(ner_model=ner_predicter)

print(message_structurer.structure_message(sentence, tags))

Batch

To predict a single sentence, the method predict_line should be used. Example of initialization e usage:

  1. Import main packages;
  2. Initialize model variables;
  3. Read PosTagging, NER model and embedding model;
  4. Read file to be structured;
  5. Initialize and usage;
  6. Package usage.

An example of the above steps could be found in the python code below:

  1. Import main packages:
import json
import torch

from TakeBlipNer.predict import NerPredict
from TakeBlipPosTagger.predict import PosTaggerPredict
from TakeBlipMessageStructurer.utils import load_fasttext_embeddings
from TakeBlipMessageStructurer.predict.messagestructurer import MessageStructurer
  1. Initialize model variables:

In order to predict the sentences tags, the following variables should be created:

  • postag_model_path: string with the path of PosTagging pickle model;
  • postag_label_path: string with the path of PosTagging pickle labels;
  • ner_model_path: string with the path of NER pickle model;
  • ner_label_path: string with the path of NER pickle labels;
  • wordembed_path: string with FastText embedding files;
  • padding_string: string which represents the pad token;
  • unknown_string: a string which represents unknown token.

Example of variables creation:

postag_model_path = '*.pkl'
postag_label_path = '*.pkl'
ner_label_path = '*.pkl'
ner_model_path = '*.pkl'
wordembed_path = '*.kv'
padding_string = '<pad>'
unk_string = '<unk>'
  1. Read Embedding, PosTagging and NER model:
embedding_model = load_fasttext_embeddings(embedding_path, pad_string)

postagging_model = torch.load(postag_model_path)
postag_predicter = PosTaggerPredict(
    model=postagging_model,
    label_path=postag_label_path,
    embedding=embedding_model)

ner_model = torch.load(ner_model_path)
ner_predicter = NerPredict(
    pad_string=pad_string,
    unk_string=unk_string,
    model=ner_model,
    postag_model=postag_predicter,
    label_path=ner_label_path)
  1. Read file to be structured:
  • In order to predict a batch, will need a json file as follows:
{
    "sentences": [
                    {
                        "id": 1, 
                        "sentence": "sentence_1"
                    }, 
                    {
                        "id": 2, 
                        "sentence": "sentence_2"
                    }
                ]
}
  • Reading json file:
file = open(path_sentences)
sentence = json.load(file)['Sentences']
  1. Initialize tags to be removed and Message Structurer:
tags = ['INT', 'ART', 'PRON', 'SIMB', 'PON', 'CONJ']
message_structurer = MessageStructurer(ner_model=ner_predicter)
  1. Package usage
  • In order to use the package, some variables should be initialized:
    • input_path: a string with path of the .csv file;
    • batch_size: number of sentences which will be predicted at the same time;
    • shuffle: a boolean representing if the dataset is shuffled;
    • use_pre_processing: a boolean indicating if sentence will be preprocessed;

Example of variable creations:

path_sentences = '*.json'
batch_size = 64
shuffle = True
use_pre_processing = True
  • Structuring a batch of sentences:
print(messagestructurer.structure_message_batch(
    batch_size=batch_size,
    shuffle=shuffle,
    use_pre_processing=use_pre_processing,
    sentences=sentence,
    tags_to_remove=tags))

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

TakeBlipMessageStructurer-0.0.1.tar.gz (10.8 kB view details)

Uploaded Source

Built Distribution

TakeBlipMessageStructurer-0.0.1-py3-none-any.whl (10.6 kB view details)

Uploaded Python 3

File details

Details for the file TakeBlipMessageStructurer-0.0.1.tar.gz.

File metadata

  • Download URL: TakeBlipMessageStructurer-0.0.1.tar.gz
  • Upload date:
  • Size: 10.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.4 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.0 CPython/3.8.10

File hashes

Hashes for TakeBlipMessageStructurer-0.0.1.tar.gz
Algorithm Hash digest
SHA256 806cce4293522695504735839757ad5cdd05183e929a77ca6d4bef2032a05824
MD5 4ae2cddd0e8bb147d789950a97eab563
BLAKE2b-256 498f9b5d76eb9f60a6a1df2ec54cd709cc0a86ad62554243c99ee4b1f2021b78

See more details on using hashes here.

File details

Details for the file TakeBlipMessageStructurer-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: TakeBlipMessageStructurer-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 10.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.4 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.0 CPython/3.8.10

File hashes

Hashes for TakeBlipMessageStructurer-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 985729d726d32789ad6b555a24004cb08a89ca9f37c98f66be3927ae6d6d600e
MD5 5f85abf9048fc46ad54c53c49ea339a5
BLAKE2b-256 00f8b47937d5239244d3242254f10726af18ab60c9f1a80751d057de2d9e9279

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page