Seq2seq model with attention for automatic orthographic simplification
Project description
ortografix
Welcome to ortografix, a seq2seq model for automatic ortografic simplification, coded with pytorch 1.4.
Install
via pip:
pip3 install ortografix
or, after a git clone:
python3 setup.py install
Train
To train a model, run:
ortografix train \
--data /abs/path/to/training/data \
--model-type gru \
--shuffle \
--hidden-size 256 \
--num-layers 1 \
--bias \
--dropout 0 \
--learning-rate 0.01 \
--epochs 10 \
--print-every 100 \
--use-teacher-forcing \
--teacher-forcing-ratio 0.5 \
--output-dirpath /abs/path/to/output/directory/whereto/save/model \
--with-attention \
--character-based
Test
Qualitative evaluation
To qualitatively evaluate the output of the model on a set of 10 randomly selected sentences from a given dev/test set, run:
ortografix evaluate \
--data /abs/path/to/test/data.txt \
--model /abs/path/to/model/directory/ \
--random 10
Quantitative evaluation
To quantitatively evaluate the output of the model on a given dev/test set, run:
ortografix evaluate \
--data /abs/path/to/test/data.txt \
--model /abs/path/to/model/directory
Quantitative evaluation will return:
- The sum of all edit (Levenshtein) distance computed across all test pairs
- The average edit distance computed across all test pairs
- The average normalized edit distance
- The average normalized edit similarity
All measure are computed via textdistance.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
ortografix-0.7.0.tar.gz
(14.6 kB
view details)
File details
Details for the file ortografix-0.7.0.tar.gz
.
File metadata
- Download URL: ortografix-0.7.0.tar.gz
- Upload date:
- Size: 14.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/46.1.3 requests-toolbelt/0.8.0 tqdm/4.45.0 CPython/3.6.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 48206504d4663b9523b7b4ee86a46f3dbaa52b175c402ab294b7764e92a65a2c |
|
MD5 | 592f46ac8b50a776b934b5852c24a910 |
|
BLAKE2b-256 | 521ca3cbef91ba292b1bece360c389e16ca7b0c0f7b299a670b94c07787db531 |