A Tensor Creation and Label Reconstruction for Sequence Labeling
Project description
sequence-label
sequence-label
is a Python library that streamlines the process of creating tensors for sequence labels and reconstructing sequence labels data from tensors. Whether you're working on named entity recognition, part-of-speech tagging, or any other sequence labeling task, this library offers a convenient utility to simplify your workflow.
Basic Usage
Import the necessary dependencies:
from transformers import AutoTokenizer
from sequence_label import LabelSet, SequenceLabel
from sequence_label.transformers import create_alignments
Start by creating sequence labels using the SequenceLabel.from_dict
method. Define your text and associated labels:
text1 = "Tokyo is the capital of Japan."
label1 = SequenceLabel.from_dict(
tags=[
{"start": 0, "end": 5, "label": "LOC"},
{"start": 24, "end": 29, "label": "LOC"},
],
size=len(text1),
)
text2 = "The Monster Naoya Inoue is the who's who of boxing."
label2 = SequenceLabel.from_dict(
tags=[{"start": 12, "end": 23, "label": "PER"}],
size=len(text2),
)
texts = [text1, text2]
labels = [label1, label2]
Next, tokenize your texts
and create the alignments
using the create_alignments
method. alignments
is a tuple of instances of LabelAlignment
that aligns sequence labels with the tokenized result:
tokenizer = AutoTokenizer.from_pretrained("roberta-base")
batch_encoding = tokenizer(texts)
alignments = create_alignments(
batch_encoding=batch_encoding,
lengths=list(map(len, texts)),
padding_token=tokenizer.pad_token
)
Now, create a label_set
that will allow you to create tensors from sequence labels and reconstruct sequence labels from tensors. Use the label_set.encode_to_tag_indices
method to create tag_indices
:
label_set = LabelSet(
labels={"ORG", "LOC", "PER", "MISC"},
padding_index=-1,
)
tag_indices = label_set.encode_to_tag_indices(
labels=labels,
alignments=alignments,
)
Finally, use the label_set.decode
method to reconstruct the sequence labels from tag_indices
and alignments
:
labels2 = label_set.decode(
tag_indices=tag_indices, alignments=alignments,
)
assert labels == labels2
Installation
pip install sequence-label
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for sequence_label-0.1.7-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5cbd249ebbb5c2575430487ff09cec49a66bc89b66eadc5df42c409c6b4fb956 |
|
MD5 | 8497b2ce2164294ef55b572a2f1c82a3 |
|
BLAKE2b-256 | 32944c8e74a8a5f08eae9dda13eebd2b8c9933114df3cb8b654d7d132f6b1dcd |