Sequence Tagger for Partially Annotated Dataset in PyTorch

These details have not been verified by PyPI

Project links

Homepage

Project description

pytorch-partial-tagger

pytorch-partial-tagger is a Python library for building a sequence tagger, specifically for the common NLP task Named Entity Recognition, with a partially annotated dataset in PyTorch. You can build your own tagger using a distantly-supervised dataset obtained from unlabled text data and a dictionary that maps surface names to their entity type. The algorithm of this library is based on Effland and Collins. (2021).

Usage

Import all dependencies first:

import torch
from sequence_label import SequenceLabel

from partial_tagger.metric import Metric
from partial_tagger.utils import create_trainer

Prepare your own datasets. Each item of dataset must have a pair of a string and a sequence label. A string represents text that you want to assign a label, which is defined as text below. A sequence label represent a set of a character-based tag, which has a start, a length, and a label, which are defined as label below. A start represents a position in the text where a tag starts. A length represents a distance in the text between the beginning of a tag and the end of a tag. A label represents what you want to assign to a span of the text defined by a start and a length.

text = "Tokyo is the capital of Japan."
label = SequenceLabel.from_dict(
    tags=[
        {"start": 0,  "end": 5, "label": "LOC"},  # Tag for Tokyo
        {"start": 24,  "end": 29, "label": "LOC"},  # Tag for Japan
    ],
    size=len(text),
)

train_dataset = [(text, label), ...]
validation_dataset = [...]
test_dataset = [...]

Here, you will train your tagger and evaluate its performance. You will train it through an instance of Trainer, which you get by calling create_trainer. After a training, you will get an instance of Recognizer which predicts character-based tags from given texts. You will evaluate the performance of your tagger using an instance of Metric as follows.

device = torch.device("cuda")

trainer = create_trainer()
recognizer = trainer(train_dataset, validation_dataset, device)

texts, ground_truths = zip(*test_dataset)

batch_size = 15
predictions = recognizer(texts, batch_size, device)

metric = Metric()
metric(predictions, ground_truths)

print(metric.get_scores())  # Display F1-score, Precision, Recall

Installation

pip install pytorch-partial-tagger

Documentation

For details about the pytorch-partial-tagger API, see the documentation.

References

Yuta Tsuboi, Hisashi Kashima, Shinsuke Mori, Hiroki Oda, and Yuji Matsumoto. 2008. Training Conditional Random Fields Using Incomplete Annotations. In Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), pages 897–904, Manchester, UK. Coling 2008 Organizing Committee.
Alexander Rush. 2020. Torch-Struct: Deep Structured Prediction Library. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 335–342, Online. Association for Computational Linguistics.
Thomas Effland and Michael Collins. 2021. Partially Supervised Named Entity Recognition via the Expected Entity Ratio Loss. Transactions of the Association for Computational Linguistics, 9:1320–1335.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.1.18

Nov 23, 2023

0.1.17

Nov 11, 2023

This version

0.1.16

Sep 18, 2023

0.1.15

Sep 18, 2023

0.1.14

Jul 23, 2023

0.1.13

Jul 23, 2023

0.1.12

Jul 16, 2023

0.1.11

Jul 15, 2023

0.1.10

Jun 27, 2023

0.1.9

Jun 18, 2023

0.1.8

Jun 15, 2023

0.1.7

Jun 11, 2023

0.1.6

Jun 4, 2023

0.1.5

Jun 2, 2023

0.1.4

Jun 2, 2023

0.1.3

Jun 1, 2023

0.1.2

May 30, 2023

0.1.1

May 30, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pytorch_partial_tagger-0.1.16.tar.gz (24.3 kB view details)

Uploaded Sep 18, 2023 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pytorch_partial_tagger-0.1.16-py3-none-any.whl (19.6 kB view details)

Uploaded Sep 18, 2023 Python 3

File details

Details for the file pytorch_partial_tagger-0.1.16.tar.gz.

File metadata

Download URL: pytorch_partial_tagger-0.1.16.tar.gz
Upload date: Sep 18, 2023
Size: 24.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for pytorch_partial_tagger-0.1.16.tar.gz
Algorithm	Hash digest
SHA256	`601a6a71b976040bed84cd77763da6ad203b3615a0e065c35f385e9d53ee3cc0`
MD5	`df1d258777877105bfdfa86989d79022`
BLAKE2b-256	`27a1363116563a51e96ab2243046f031e39de89dac706e6a13048992bf1e6e85`

See more details on using hashes here.

File details

Details for the file pytorch_partial_tagger-0.1.16-py3-none-any.whl.

File metadata

Download URL: pytorch_partial_tagger-0.1.16-py3-none-any.whl
Upload date: Sep 18, 2023
Size: 19.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for pytorch_partial_tagger-0.1.16-py3-none-any.whl
Algorithm	Hash digest
SHA256	`095e602c51f0fd82c0b4123ba14af0894ac018c498764d161ceb81d41cae3709`
MD5	`46dc984fb7e3c14295173ca82fa4f597`
BLAKE2b-256	`c8de6812b5546f8e7aa8c83774fcb9051daa6afcc015aead6ba33db0d74be06f`

See more details on using hashes here.

pytorch-partial-tagger 0.1.16

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

pytorch-partial-tagger

Usage

Installation

Documentation

References

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes