Simple PyTorch-based tagger
Project description
pttagger is a simple PyTorch-based tagger which has the following features:
stacked bi-directional RNN (GRU or LSTM)
variable-sized mini-batches
multiple inputs
Quick Start
Installation
Run this command in your terminal:
$ pip install pttagger
Pre-processing
Suppose that you have the following examples of named entity recognition:
Joe/B-PER Smith/I-PER goes/O to/O Japan/B-LOC ./O
Jane/B-PER Smith/I-PER belongs/O to/O Kyoto/B-ORG University/I-ORG ./O
…
First, give the examples to construct a Dataset object like this:
from pttagger.dataset import Dataset
examples = [
{'Xs': [['Joe', 'Doe', 'goes', 'to', 'Japan', '.']],
'Y': ['B-PER', 'I-PER', 'O', 'O', 'B-LOC', 'O']},
{'Xs': [['Jane', 'Doe', 'belongs', 'to', 'Kyoto', 'University', '.']],
'Y': ['B-PER', 'I-PER', 'O', 'O', 'B-ORG', 'I-ORG', 'O']},
...
]
dataset = Dataset(examples)
You can also use multiple inputs as a value of Xs. In the following case, Xs has not only word information but also POS information:
examples = [
{'Xs': [['Joe', 'Doe', 'goes', 'to', 'Japan', '.'], ['NNP', 'NNP', 'VBZ', 'TO', 'NNP', '.']],
'Y': ['B-PER', 'I-PER', 'O', 'O', 'B-LOC', 'O']},
{'Xs': [['Jane', 'Doe', 'belongs', 'to', 'Kyoto', 'University', '.'], ['NNP', 'NNP', 'VBZ', 'TO', 'NNP', 'NNP', '.']],
'Y': ['B-PER', 'I-PER', 'O', 'O', 'B-ORG', 'I-ORG', 'O']},
...
]
dataset = Dataset(examples)
Now, dataset has the following two indices:
x_to_index: e.g., [{'<PAD>': 0, '<UNK>': 1, 'Joe': 2, 'Doe': 3, ...}]
y_to_index: e.g., {'<PAD>': 0, '<UNK>': 1, 'B-PER': 2, 'I-PER': 3, ...}
If you use multiple inputs, x_to_index has indices for each input.
Training
Construct a Model object and train it as follows:
from pttagger.model import Model
EMBEDDING_DIMS = [100] # if you use multiple inputs, set a dimension for each input
HIDDEN_DIMS = [10] # the same as above
x_set_sizes = [len(x_to_index) for x_to_index in dataset.x_to_index]
y_set_size = len(dataset.y_to_index)
model = Model(EMBEDDING_DIMS, HIDDEN_DIMS, x_set_sizes, y_set_size)
model.train(dataset)
You can also use the following parameters:
use_lstm - If True, uses LSTM. Default: False (uses GRU)
num_layers - Number of recurrent layers. Default: 1
Test
Predict tags for test examples like this:
test_examples = [
{'Xs': [['Richard', 'Roe', 'comes', 'to', 'America', '.']],
'Y': ['B-PER', 'I-PER', 'O', 'O', 'B-LOC', 'O']}
]
test_dataset = Dataset(test_examples)
results = model.test(dataset)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file pttagger-0.1.1.tar.gz
.
File metadata
- Download URL: pttagger-0.1.1.tar.gz
- Upload date:
- Size: 13.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: Python-urllib/3.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3004508163422ca35093effde4fd6d7997bb5bc9bfaf4971a958dfeed1635a66 |
|
MD5 | 463b04ad655047ba31121de9da194e34 |
|
BLAKE2b-256 | 7dc39962a15d5717c9844dbb8ac2f6c5c16f5cd36e031286f169df8baa38f6f7 |