A Tensor Creation and Label Reconstruction for Sequence Labeling
Project description
sequence-label
sequence-label is a Python library that streamlines the process of creating tensors for sequence labels and reconstructing sequence labels data from tensors. Whether you're working on named entity recognition, part-of-speech tagging, or any other sequence labeling task, this library offers a convenient utility to simplify your workflow.
Basic Usage
Import the necessary dependencies:
from transformers import AutoTokenizer
from sequence_label import LabelSet, SequenceLabel
from sequence_label.transformers import create_alignments
Start by creating sequence labels using the SequenceLabel.from_dict method. Define your text and associated labels:
text1 = "Tokyo is the capital of Japan."
label1 = SequenceLabel.from_dict(
tags=[
{"start": 0, "end": 5, "label": "LOC"},
{"start": 24, "end": 29, "label": "LOC"},
],
size=len(text1),
)
text2 = "The Monster Naoya Inoue is the who's who of boxing."
label2 = SequenceLabel.from_dict(
tags=[{"start": 12, "end": 23, "label": "PER"}],
size=len(text2),
)
texts = [text1, text2]
labels = [label1, label2]
Next, tokenize your texts and create the alignments using the create_alignments method. alignments is a tuple of instances of LabelAlignment that aligns sequence labels with the tokenized result:
tokenizer = AutoTokenizer.from_pretrained("roberta-base")
batch_encoding = tokenizer(texts)
alignments = create_alignments(
batch_encoding=batch_encoding,
lengths=list(map(len, texts)),
padding_token=tokenizer.pad_token
)
Now, create a label_set that will allow you to create tensors from sequence labels and reconstruct sequence labels from tensors. Use the label_set.encode_to_tag_indices method to create tag_indices:
label_set = LabelSet(
labels={"ORG", "LOC", "PER", "MISC"},
padding_index=-1,
)
tag_indices = label_set.encode_to_tag_indices(
labels=labels,
alignments=alignments,
)
Finally, use the label_set.decode method to reconstruct the sequence labels from tag_indices and alignments:
labels2 = label_set.decode(
tag_indices=tag_indices, alignments=alignments,
)
assert labels == labels2
Installation
pip install sequence-label
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sequence_label-0.1.8.tar.gz.
File metadata
- Download URL: sequence_label-0.1.8.tar.gz
- Upload date:
- Size: 12.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5bce3f09237bdf019684cf4a349273ec4a44ca61915f577bfd40d529cd661f54
|
|
| MD5 |
b74d323eb3ff81ae6ebc0a42533b9ef1
|
|
| BLAKE2b-256 |
7457a7a694a58615dd66707b76f823a5027b5b87c1020540dabbdabd4351fea9
|
File details
Details for the file sequence_label-0.1.8-py3-none-any.whl.
File metadata
- Download URL: sequence_label-0.1.8-py3-none-any.whl
- Upload date:
- Size: 9.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ba84fe54011d0eb1d61ab23ff6a0d296444faa9037f48cc500b6e6a2ed4623f6
|
|
| MD5 |
3ef7d171ba4572269a08a0f4369ce563
|
|
| BLAKE2b-256 |
42dc79bf0822c92437178d7d97be502a1f910718b7ca812785b2f4487e370f9a
|