A Sequence Labeling Utility
Project description
sequence-label
sequence-label
is a Python library that streamlines the process of creating tensors for sequence labeling data and reconstructing sequence labeling data from the output tensors of neural models. Whether you're working on named entity recognition, part-of-speech tagging, or any other sequence labeling task, this library offers a convenient utility to simplify your workflow.
Basic Usage
Import the necessary dependencies:
from transformers import AutoTokenizer
from sequence_label import LabelSet, SequenceLabel
from sequence_label.transformers import create_alignments
Start by creating sequence label data using the SequenceLabel.from_dict
method. Define your text and associated labels:
text1 = "Tokyo is the capital of Japan."
label1 = SequenceLabel.from_dict(
tags=[
{"start": 0, "end": 5, "label": "LOC"},
{"start": 24, "end": 29, "label": "LOC"},
],
size=len(text1),
)
text2 = "The Monster Naoya Inoue is the who's who of boxing."
label2 = SequenceLabel.from_dict(
tags=[{"start": 12, "end": 23, "label": "PER"}],
size=len(text2),
)
texts = (text1, text2)
labels = (label1, label2)
Next, tokenize your texts
and calculate the alignments
using the create_alignments
function. alignments
is a tuple of instances of LabelAlignment
that aligns sequence labels with the tokenized result:
tokenizer = AutoTokenizer.from_pretrained("roberta-base")
batch_encoding = tokenizer(texts)
alignments = create_alignments(
batch_encoding=batch_encoding,
lengths=list(map(len, texts)),
padding_token=tokenizer.pad_token
)
Now, create a label_set
that will allow you to create tensors and reconstruct sequence labels from tensors. By calling label_set.encode_to_tag_indices
, you can generate tag_indices
:
label_set = LabelSet(
labels={"ORG", "LOC", "PER", "MISC"},
padding_index=-1,
)
tag_indices = label_set.encode_to_tag_indices(labels=labels, alignments=alignments)
Finally, use the label_set.decode
method to reconstruct the sequence labels from tag_indices
and alignments
:
labels2 = label_set.decode(tag_indices=tag_indices, alignments=alignments)
assert labels == labels2
Installation
pip install sequence-label
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for sequence_label-0.1.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b5d83062cee7ff6d944d1b5185d57a75702fd9eafa7030d7e9f2d9e56dd9533e |
|
MD5 | 7cb0c12f5509572818b5f9b753e7cf84 |
|
BLAKE2b-256 | f68b49a0121cc3271256d4f886863bd05fe2f5af23049bb6d92735e8324cd3e3 |