Skip to main content

A Tensor Creation and Label Reconstruction for Sequence Labeling

Project description

sequence-label

sequence-label is a Python library that streamlines the process of creating tensors for sequence labels and reconstructing sequence labels data from tensors. Whether you're working on named entity recognition, part-of-speech tagging, or any other sequence labeling task, this library offers a convenient utility to simplify your workflow.

Basic Usage

Import the necessary dependencies:

from transformers import AutoTokenizer

from sequence_label import LabelSet, SequenceLabel
from sequence_label.transformers import create_alignments

Start by creating sequence labels using the SequenceLabel.from_dict method. Define your text and associated labels:

text1 = "Tokyo is the capital of Japan."
label1 = SequenceLabel.from_dict(
    tags=[
        {"start": 0, "end": 5, "label": "LOC"},
        {"start": 24, "end": 29, "label": "LOC"},
    ],
    size=len(text1),
)

text2 = "The Monster Naoya Inoue is the who's who of boxing."
label2 = SequenceLabel.from_dict(
    tags=[{"start": 12, "end": 23, "label": "PER"}],
    size=len(text2),
)

texts = [text1, text2]
labels = [label1, label2]

Next, tokenize your texts and create the alignments using the create_alignments method. alignments is a tuple of instances of LabelAlignment that aligns sequence labels with the tokenized result:

tokenizer = AutoTokenizer.from_pretrained("roberta-base")
batch_encoding = tokenizer(texts)

alignments = create_alignments(
    batch_encoding=batch_encoding,
    lengths=list(map(len, texts)),
    padding_token=tokenizer.pad_token
)

Now, create a label_set that will allow you to create tensors from sequence labels and reconstruct sequence labels from tensors. Use the label_set.encode_to_tag_indices method to create tag_indices:

label_set = LabelSet(
    labels={"ORG", "LOC", "PER", "MISC"},
    padding_index=-1,
)

tag_indices = label_set.encode_to_tag_indices(
    labels=labels,
    alignments=alignments,
)

Finally, use the label_set.decode method to reconstruct the sequence labels from tag_indices and alignments:

labels2 = label_set.decode(
    tag_indices=tag_indices, alignments=alignments,
)

assert labels == labels2

Installation

pip install sequence-label

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sequence_label-0.1.7.tar.gz (12.6 kB view details)

Uploaded Source

Built Distribution

sequence_label-0.1.7-py3-none-any.whl (9.4 kB view details)

Uploaded Python 3

File details

Details for the file sequence_label-0.1.7.tar.gz.

File metadata

  • Download URL: sequence_label-0.1.7.tar.gz
  • Upload date:
  • Size: 12.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.8

File hashes

Hashes for sequence_label-0.1.7.tar.gz
Algorithm Hash digest
SHA256 e931dcc0952b1accc471345426f54a415b0cb2ad832cc523fe6cc1839f31411a
MD5 bb16203cc1eb82e072d5492a50a67db8
BLAKE2b-256 451759fda8e02db73dd8dfc85129a463fb6c23cd30f1c530c73bb65ec760a6a5

See more details on using hashes here.

File details

Details for the file sequence_label-0.1.7-py3-none-any.whl.

File metadata

File hashes

Hashes for sequence_label-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 5cbd249ebbb5c2575430487ff09cec49a66bc89b66eadc5df42c409c6b4fb956
MD5 8497b2ce2164294ef55b572a2f1c82a3
BLAKE2b-256 32944c8e74a8a5f08eae9dda13eebd2b8c9933114df3cb8b654d7d132f6b1dcd

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page