Sequence Tagger for Partially Annotated Dataset in PyTorch

These details have not been verified by PyPI

Project links

Homepage

Project description

pytorch-partial-tagger

This is a library to build a CRF tagger for a partially annotated dataset in PyTorch. You can build your own NER tagger only from dictionary. The algorithm of this tagger is based on Effland and Collins. (2021).

Usage

Import all dependencies first:

import torch

from partial_tagger.data import CharBasedTags
from partial_tagger.training import Trainer
from partial_tagger.utils import Metric, create_tag

Prepare your own datasets. Each item of dataset must have a string and tags. A string represents text below. Tags represent a collection of tags, where each tag has a start, a length, and a label, which are defined as tags below. A start represents a position in text where a tag starts. A length represents a distance in text between the beginning of a tag and the end of a tag. A label represents what you want to assign to a span of text defined by a start and a length.

from partial_tagger.utils import create_tags, CharBasedTags


text = "Tokyo is the capital of Japan."
tags = CharBasedTags(
    (
        create_tag(start=0, length=5, label="LOC"),  # Tag for Tokyo
        create_tag(start=24, length=5, label="LOC")  # Tag for Japan
    ),
    text
)
train_dataset = [(text, tags), ...]
validation_dataset = [...]
test_dataset = [...]

Here, you would train your tagger and evaluate its performance.

You could train your own tagger by initializing Trainer and passing datasets to it. After training, trainer gives you Recognizer object which predicts character-based tags from given texts.

You could evaluate the performance of your tagger using Metric as below.

device = torch.device("cuda")

trainer = Trainer()
recognizer = trainer(train_dataset, validation_dataset, device)

texts, ground_truths = zip(*test_dataset, strict=True)

batch_size = 15
predictions = recognizer(texts, batch_size, device)

metric = Metric()
metric(predictions, ground_truths)

print(metric.get_scores())  # Display F1-score, Precision, Recall

Installation

pip install pytorch-partial-tagger

References

Thomas Effland and Michael Collins. 2021. Partially Supervised Named Entity Recognition via the Expected Entity Ratio Loss. Transactions of the Association for Computational Linguistics, 9:1320–1335.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.1.18

Nov 23, 2023

0.1.17

Nov 11, 2023

0.1.16

Sep 18, 2023

0.1.15

Sep 18, 2023

0.1.14

Jul 23, 2023

0.1.13

Jul 23, 2023

0.1.12

Jul 16, 2023

0.1.11

Jul 15, 2023

0.1.10

Jun 27, 2023

0.1.9

Jun 18, 2023

0.1.8

Jun 15, 2023

0.1.7

Jun 11, 2023

This version

0.1.6

Jun 4, 2023

0.1.5

Jun 2, 2023

0.1.4

Jun 2, 2023

0.1.3

Jun 1, 2023

0.1.2

May 30, 2023

0.1.1

May 30, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pytorch_partial_tagger-0.1.6.tar.gz (23.7 kB view hashes)

Uploaded Jun 4, 2023 Source

Built Distribution

pytorch_partial_tagger-0.1.6-py3-none-any.whl (19.2 kB view hashes)

Uploaded Jun 4, 2023 Python 3

Hashes for pytorch_partial_tagger-0.1.6.tar.gz

Hashes for pytorch_partial_tagger-0.1.6.tar.gz
Algorithm	Hash digest
SHA256	`b3c4f0363e4bcf2e2a1cd51ded57a1526dc6f7206683f1264753a7d992cd1954`
MD5	`0ab08710faa95d1d3a8f180fa4ffa506`
BLAKE2b-256	`b453b646f4057d1d349ac41c258f2df4faa589207c1780336b06f0a59f2b9791`

Hashes for pytorch_partial_tagger-0.1.6-py3-none-any.whl

Hashes for pytorch_partial_tagger-0.1.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`85f38152afdbbe45a3491b57b43bdc476976ba3adcadf8bb0a96dba8c10683cd`
MD5	`c72d39a4a1b55c8c5dbf0715b33168ff`
BLAKE2b-256	`5b12cce6f661bcfcbbfb1a3d588cd7006885fca0947d40e0f5d641d187698c7b`