A Tensor Creation and Label Reconstruction for Sequence Labeling
Project description
sequence-label
sequence-label
is a Python library that streamlines the process of creating tensors for sequence labeling data and reconstructing sequence labeling data from tensors. Whether you're working on named entity recognition, part-of-speech tagging, or any other sequence labeling task, this library offers a convenient utility to simplify your workflow.
Basic Usage
Import the necessary dependencies:
from transformers import AutoTokenizer
from sequence_label import LabelSet, SequenceLabel
from sequence_label.transformers import create_alignments
Start by creating sequence labels using the SequenceLabel.from_dict
method. Define your text and associated labels:
text1 = "Tokyo is the capital of Japan."
label1 = SequenceLabel.from_dict(
tags=[
{"start": 0, "end": 5, "label": "LOC"},
{"start": 24, "end": 29, "label": "LOC"},
],
size=len(text1),
)
text2 = "The Monster Naoya Inoue is the who's who of boxing."
label2 = SequenceLabel.from_dict(
tags=[{"start": 12, "end": 23, "label": "PER"}],
size=len(text2),
)
texts = (text1, text2)
labels = (label1, label2)
Next, tokenize your texts
and create the alignments
using the create_alignments
function. alignments
is a tuple of instances of LabelAlignment
that aligns sequence labels with the tokenized result:
tokenizer = AutoTokenizer.from_pretrained("roberta-base")
batch_encoding = tokenizer(texts)
alignments = create_alignments(
batch_encoding=batch_encoding,
lengths=list(map(len, texts)),
padding_token=tokenizer.pad_token
)
Now, create a label_set
that will allow you to create tensors from sequence labels and reconstruct sequence labels from tensors. Use the label_set.encode_to_tag_indices
method to create tag_indices
:
label_set = LabelSet(
labels={"ORG", "LOC", "PER", "MISC"},
padding_index=-1,
)
tag_indices = label_set.encode_to_tag_indices(
labels=labels,
alignments=alignments,
)
Finally, use the label_set.decode
method to reconstruct the sequence labels from tag_indices
and alignments
:
labels2 = label_set.decode(
tag_indices=tag_indices, alignments=alignments,
)
assert labels == labels2
Installation
pip install sequence-label
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for sequence_label-0.1.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e15ccfe7183d94b604cfdceb2da88eaed6240a4891dc1394911252037405a67a |
|
MD5 | 14eb20915ef47afe9dcef2b290ae432f |
|
BLAKE2b-256 | 827b4c0b7eb145449d734579a736b795b348417fdb4bfb40e2612f0962ed3aa1 |