Skip to main content

Sequence Tagger for Partially Annotated Dataset in PyTorch

Project description

Sequence Tagger for Partially Annotated Dataset in PyTorch

This is a CRF tagger for partially annotated dataset in PyTorch. You can easily utilize marginal log likelihood for CRF (Tsuboi, et al., 2008). The implementation of this library is based on Rush, 2020.

Usage

First, import some modules as follows.

from partial_tagger.crf.nn import CRF
from partial_tagger.crf import functional as F

Initialize CRF by giving it the number of tags.

num_tags = 2
crf = CRF(num_tags)

Prepare incomplete tag sequence (partial annotation) and convert it to a tag bitmap.
This tag bitmap represents the target value for CRF.

# 0-1 indicates a true tag
# -1 indicates that a tag is unknown
incomplete_tags = torch.tensor([[0, 1, 0, 1, -1, -1, -1, 1, 0, 1]])

tag_bitmap = F.to_tag_bitmap(incomplete_tags, num_tags=num_tags, partial_index=-1)

Compute marginal log likelihood from logits.

batch_size = 1
sequence_length = 10
# Dummy logits
logits = torch.randn(batch_size, sequence_length, num_tags)

log_potentials = crf(logits)

loss = F.marginal_log_likelihood(log_potentials, tag_bitmap).sum().neg()

Installation

To install this package:

pip install partial-tagger

References

  • Yuta Tsuboi, Hisashi Kashima, Shinsuke Mori, Hiroki Oda, and Yuji Matsumoto. 2008. Training Conditional Random Fields Using Incomplete Annotations. In Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), pages 897–904, Manchester, UK. Coling 2008 Organizing Committee.
  • Alexander Rush. 2020. Torch-Struct: Deep Structured Prediction Library. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 335–342, Online. Association for Computational Linguistics.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

partial-tagger-0.6.1.tar.gz (7.7 kB view hashes)

Uploaded Source

Built Distribution

partial_tagger-0.6.1-py3-none-any.whl (10.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page