Neural macine translation soft alignment visualisations for web and command line
Project description
NMT Attention Alignment Visualizations
An attention alignment visualization tool for command line and web.
A part of the web version was borrowed from Nematus utils
The Machine Translation Marathon 2019 tutorial can be found here
Usage
- Train a neural MT system (using Marian, Sockeye or Neural Monkey)
- Translate text and get word or subword level alignments
- Visualize the alignments
- in the command line standard output
- in a web browser (PHP required)
Requirements
-
Python 3.6 or newer
-
NLTK for BLEU calculation(requires Python versions 3.5, 3.6, 3.7, or 3.8)
-
Numpy
pip install numpy nltk
-
-
PHP 5.4 or newer (for web visualization)
How to get alignment files from NMT systems
-
Nematus (This worked for the Theano version. Not sure about the current Tensorflow version...)
- Run nematus/translate.py with the
--output_alignment
or-a
parameter
- Run nematus/translate.py with the
-
- In the training.ini file add
[alignment_saver] class=runners.word_alignment_runner.WordAlignmentRunner output_series="ali" encoder=<encoder> decoder=<decoder>
and add alignment_saver to the runners in main
runners=[<runner_greedy>, <alignment_saver>]
- In the translation.ini file in eval_data add
s_ali_out="out.alignment"
-
- Run marian-decoder with the parameter
--alignment soft
- If you use the transformer model architecture, you will need to train it with guided alignments. See the MT Marathon 2019 Tutorial for details about this.
- Run marian-decoder with the parameter
-
OpenNMT Run translate.lua to translate with the
-save_attention
parameter to save attentions to a file -
- Run sockeye.translate to translate with the
--output-type
parameter set totranslation_with_alignment_matrix
to save attentions to a file. - If you use the transformer model architecture, you will need to implement (copy in the correct places) code from this pull requst. Otherwise, Sockeye will return attention values of 0 for all tokens.
- Run sockeye.translate to translate with the
-
Other The easiest format to use (and convert to) is the one used by Nematus.
For each sentence the first row should be
<sentence id> ||| <target words> ||| <score> ||| <source words> ||| <number of source words> <number of target words>
After that follow
<number of target words> + 1 (for <EOS> token)
rows with<number of source words> + 1 (for <EOS> token)
columns with attention weights separated by spaces.After each sentence there should be an empty line.
Note that the values of
<sentence id>
,<score>
,<number of source words>
and<number of target words>
are actually ignored when creating visualizations, so they may as well all be 0.An example:
0 ||| Obama welcomes Netanyahu ||| 0 ||| Obama empfängt Net@@ any@@ ah@@ u ||| 7 4 0.723834 0.0471278 0.126415 0.000413103 0.000774298 0.000715227 0.10072 0.00572539 0.743366 0.0342341 0.000315413 0.00550132 0.00150629 0.209351 0.0122618 0.0073559 0.909192 0.000606444 0.00614908 0.00256837 0.0618667 0.00110054 0.0214063 0.0759918 0.000446028 0.104856 0.0435644 0.752634
Publications
If you use this tool, please cite the following paper:
Matīss Rikters, Mark Fishel, Ondřej Bojar (2017). "Visualizing Neural Machine Translation Attention and Confidence." In The Prague Bulletin of Mathematical Linguistics volume 109 (2017).
@inproceedings{Rikters-EtAl2017PBML,
author = {Rikters, Matīss and Fishel, Mark and Bojar, Ond\v{r}ej},
journal={The Prague Bulletin of Mathematical Linguistics},
volume={109},
pages = {1--12},
title = {{Visualizing Neural Machine Translation Attention and Confidence}},
address={Lisbon, Portugal},
year = {2017}
}
Examples
-
in the command line as shaded blocks. Example with Neural Monkey alignments (separate source and target subword unit files are required)
python process_alignments.py \ -i test_data/neuralmonkey/alignments.npy \ -o color \ -s test_data/neuralmonkey/src.en.bpe \ -t test_data/neuralmonkey/out.lv.bpe \ -f NeuralMonkey
-
the same with Nematus alignments (source and target subword units are in the same file)
python process_alignments.py \ -i test_data/nematus/alignments.txt \ -o color \ -f Nematus
-
in a text file as Unicode block elements
python process_alignments.py \ -i test_data/neuralmonkey/alignments.npy \ -o block \ -s test_data/neuralmonkey/src.en.bpe \ -t test_data/neuralmonkey/out.lv.bpe \ -f NeuralMonkey
or
python process_alignments.py \ -i test_data/neuralmonkey/alignments.npy \ -o block2 \ -s test_data/neuralmonkey/src.en.bpe \ -t test_data/neuralmonkey/out.lv.bpe \ -f NeuralMonkey
-
in the browser as links between words (demo here)
python process_alignments.py \ -i test_data/marian/marian.out.lv \ -s test_data/marian/marian.src.en \ -o web \ -f Marian
Parameters for process_alignments.py
Option | Description | Required | Possible Values | Default Value |
---|---|---|---|---|
-f | NMT framework where the alignments are from | No | 'NeuralMonkey', 'Nematus', 'Marian', 'OpenNMT', 'Sockeye' | 'NeuralMonkey' |
-i | input alignment file | Only if no configuration file is provided | Path to file | |
-s | source sentence subword units | For Neural Monkey or Marian | Path to file | |
-t | target sentence subword units | For Neural Monkey | Path to file | |
-o | output type | No | 'web', 'color', 'block', 'block2', 'compare' | 'web' |
-r | reference file for calculating BLEU score | No | Path to file | |
-n | Number of a specific sentence | No | Integer | -1 (show all) |
-c | configuration file | No | Path to file | |
-d | Combine subword units (De-BPE) | No | Integer | |
-v | NMT framework where the 2nd alignments are from | For output type'compare' | 'NeuralMonkey', 'Nematus', 'Marian', 'OpenNMT', 'Sockeye' | 'NeuralMonkey' |
-w | input file for the 2nd alignments | For output type'compare' | Path to file | |
-x | 2nd source sentence subword unit file | For output type'compare' and Neural Monkey or Marian | Path to file | |
-y | 2nd target sentence subword unit file | For output type'compare' and Neural Monkey | Path to file |
Configuration file
The parameters can be provided via configuration .ini file to have a smaller mess in the command line when calling the script.
Block | Option | Description |
---|---|---|
AlignmentsOne | From | NMT framework where the alignments are from |
AlignmentsOne | InputFile | input alignment file |
AlignmentsOne | SourceFile | source sentence subword units |
AlignmentsOne | TargetFile | target sentence subword units |
AlignmentsOne | ReferenceFile | reference file for calculating BLEU score |
Options | OutputType | output type |
Options | Number | Number of a specific sentence |
AlignmentsTwo | From | NMT framework where the 2nd alignments are from |
AlignmentsTwo | InputFile | input file for the 2nd alignments |
AlignmentsTwo | SourceFile | 2nd source sentence subword unit file |
AlignmentsTwo | TargetFile | 2nd target sentence subword unit file |
For example, create a config.ini file:
[AlignmentsOne]
InputFile: ./test_data/neuralmonkey/alignments.npy
SourceFile: ./test_data/neuralmonkey/src.en.bpe
TargetFile: ./test_data/neuralmonkey/out.lv.bpe
From: NeuralMonkey
[Options]
OutputType: color
And run:
python process_alignments.py -c config.ini
Screenshots
Color, Block, Block2
Web
Compare
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file softalignments-1.0.4.tar.gz
.
File metadata
- Download URL: softalignments-1.0.4.tar.gz
- Upload date:
- Size: 14.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.1.0 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.6.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9c862354a2043517a3e291a049320141f1e85dd66df3c182a30c1b669c62403b |
|
MD5 | 1c1a581474ba064803565200f4464490 |
|
BLAKE2b-256 | 4c6545c4d899da83cb973e6c30c2e4ac06e66fc8d0fedad5eea045a2815b3279 |
File details
Details for the file softalignments-1.0.4-py3-none-any.whl
.
File metadata
- Download URL: softalignments-1.0.4-py3-none-any.whl
- Upload date:
- Size: 12.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.1.0 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.6.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d81747bff59a29275a6f2cb1b34cc46577a03774e1087341ae40352c00b7ad4b |
|
MD5 | 4b6bfcbdc67bf5e60bdfc1652f669680 |
|
BLAKE2b-256 | 50937c9374a8bb0deb1fa054b2c0e30f586346e984636df9f242b2f93be53710 |