Sequence Tagger for Partially Annotated Dataset in PyTorch
Project description
Sequence Tagger for Partially Annotated Dataset in PyTorch
This is a CRF tagger for partially annotated dataset in PyTorch. You can easily utilize marginal log likelihood for CRF (Tsuboi, et al., 2008). The implementation of this library is based on Rush, 2020.
Usage
First, import some modules as follows.
from partial_tagger.crf.nn import CRF
from partial_tagger.crf import functional as F
Initialize CRF
by giving it the number of tags.
num_tags = 2
crf = CRF(num_tags)
Prepare incomplete tag sequence (partial annotation) and convert it to a tag bitmap.
This tag bitmap represents the target value for CRF.
# 0-1 indicates a true tag
# -1 indicates that a tag is unknown
incomplete_tags = torch.tensor([[0, 1, 0, 1, -1, -1, -1, 1, 0, 1]])
tag_bitmap = F.to_tag_bitmap(incomplete_tags, num_tags=num_tags, partial_index=-1)
Compute marginal log likelihood from logits.
batch_size = 1
sequence_length = 10
# Dummy logits
logits = torch.randn(batch_size, sequence_length, num_tags)
log_potentials = crf(logits)
loss = F.marginal_log_likelihood(log_potentials, tag_bitmap).sum().neg()
Installation
To install this package:
pip install partial-tagger
References
- Yuta Tsuboi, Hisashi Kashima, Shinsuke Mori, Hiroki Oda, and Yuji Matsumoto. 2008. Training Conditional Random Fields Using Incomplete Annotations. In Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), pages 897–904, Manchester, UK. Coling 2008 Organizing Committee.
- Alexander Rush. 2020. Torch-Struct: Deep Structured Prediction Library. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 335–342, Online. Association for Computational Linguistics.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
partial-tagger-0.6.1.tar.gz
(7.7 kB
view hashes)
Built Distribution
Close
Hashes for partial_tagger-0.6.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 63ce20f638708b6868804ab17ab0da9943338042c75b1e56bce4c3c1156fba84 |
|
MD5 | 726136b98f5dfb7d81daa0eca4e0cdd0 |
|
BLAKE2b-256 | 791264b179c718320c085852a183747fb9984f076be8b3b5cffbe46ecda4ec5a |