Skip to main content

A Tensor Creation and Label Reconstruction for Sequence Labeling

Project description

sequence-label

sequence-label is a Python library that streamlines the process of creating tensors for sequence labels and reconstructing sequence labels data from tensors. Whether you're working on named entity recognition, part-of-speech tagging, or any other sequence labeling task, this library offers a convenient utility to simplify your workflow.

Basic Usage

Import the necessary dependencies:

from transformers import AutoTokenizer

from sequence_label import LabelSet, SequenceLabel
from sequence_label.transformers import create_alignments

Start by creating sequence labels using the SequenceLabel.from_dict method. Define your text and associated labels:

text1 = "Tokyo is the capital of Japan."
label1 = SequenceLabel.from_dict(
    tags=[
        {"start": 0, "end": 5, "label": "LOC"},
        {"start": 24, "end": 29, "label": "LOC"},
    ],
    size=len(text1),
)

text2 = "The Monster Naoya Inoue is the who's who of boxing."
label2 = SequenceLabel.from_dict(
    tags=[{"start": 12, "end": 23, "label": "PER"}],
    size=len(text2),
)

texts = [text1, text2]
labels = [label1, label2]

Next, tokenize your texts and create the alignments using the create_alignments method. alignments is a tuple of instances of LabelAlignment that aligns sequence labels with the tokenized result:

tokenizer = AutoTokenizer.from_pretrained("roberta-base")
batch_encoding = tokenizer(texts)

alignments = create_alignments(
    batch_encoding=batch_encoding,
    lengths=list(map(len, texts)),
    padding_token=tokenizer.pad_token
)

Now, create a label_set that will allow you to create tensors from sequence labels and reconstruct sequence labels from tensors. Use the label_set.encode_to_tag_indices method to create tag_indices:

label_set = LabelSet(
    labels={"ORG", "LOC", "PER", "MISC"},
    padding_index=-1,
)

tag_indices = label_set.encode_to_tag_indices(
    labels=labels,
    alignments=alignments,
)

Finally, use the label_set.decode method to reconstruct the sequence labels from tag_indices and alignments:

labels2 = label_set.decode(
    tag_indices=tag_indices, alignments=alignments,
)

assert labels == labels2

Installation

pip install sequence-label

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sequence_label-0.1.8.tar.gz (12.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sequence_label-0.1.8-py3-none-any.whl (9.4 kB view details)

Uploaded Python 3

File details

Details for the file sequence_label-0.1.8.tar.gz.

File metadata

  • Download URL: sequence_label-0.1.8.tar.gz
  • Upload date:
  • Size: 12.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for sequence_label-0.1.8.tar.gz
Algorithm Hash digest
SHA256 5bce3f09237bdf019684cf4a349273ec4a44ca61915f577bfd40d529cd661f54
MD5 b74d323eb3ff81ae6ebc0a42533b9ef1
BLAKE2b-256 7457a7a694a58615dd66707b76f823a5027b5b87c1020540dabbdabd4351fea9

See more details on using hashes here.

File details

Details for the file sequence_label-0.1.8-py3-none-any.whl.

File metadata

  • Download URL: sequence_label-0.1.8-py3-none-any.whl
  • Upload date:
  • Size: 9.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for sequence_label-0.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 ba84fe54011d0eb1d61ab23ff6a0d296444faa9037f48cc500b6e6a2ed4623f6
MD5 3ef7d171ba4572269a08a0f4369ce563
BLAKE2b-256 42dc79bf0822c92437178d7d97be502a1f910718b7ca812785b2f4487e370f9a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page