WFST for Ukrainian Inverse Text Normalization (ITN) based on NVIDIA NeMo and Pynini
Project description
WFST for Ukrainian ITN
Simple WFST for Ukrainian ITN based on NVIDIA NeMo and Pynini
Installation
pip install ukr-itn
Usage
from ukr.wfst import graph, apply_fst_text
apply_fst_text("це трапилося дві тисячі дев'ятнадцятого числа", graph) # це трапилося 2019 числа
apply_fst_text("мінус пять цілих одна десята відсотка", graph) # -5.1 %
apply_fst_text("двадцять дві тисячі сто один", graph) # 22101
From command line
echo "це трапилося дві тисячі дев'ятнадцятого числа" | python -m ukr
Will return це трапилося 2019-го числа
JSON output
For more advanced usage you can get json output
from ukr.wfst import json_graph, apply_fst_text
apply_fst_text("це трапилося дві тисячі дев'ятнадцятого числа", json_graph)
# >>> '[{"word": "це"}, {"word": "трапилося"}, {"ordinal": "2019"}, {"word": "числа"}]'
How it works
We have two king of FST: taggers and verbalizers
This is a tagger:
from ukr.wfst import classifyFst, apply_fst_text
apply_fst_text("мінус пять цілих одна десята відсотка", classifyFst.fst)
will return "measure { decimal { negative: "true" integer_part: "5" fractional_part: "1" } units: "%" }"
And this is a verbalizers
from ukr.wfst import verbalizeFinalFst, apply_fst_text
apply_fst_text('measure { decimal { negative: "true" integer_part: "5" fractional_part: "1" } units: "%" }', verbalizeFinalFst.fst)
will return -5.1 %
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
ukr_itn-0.1.6.tar.gz
(19.3 kB
view details)
Built Distribution
ukr_itn-0.1.6-py3-none-any.whl
(34.6 kB
view details)
File details
Details for the file ukr_itn-0.1.6.tar.gz
.
File metadata
- Download URL: ukr_itn-0.1.6.tar.gz
- Upload date:
- Size: 19.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 596d9d1ba0dee94834475dbaff491477461884e9c906a4acd431481845ba92f3 |
|
MD5 | 002eefa50782a87e5781e5a2194c7fb6 |
|
BLAKE2b-256 | cf992110a0b0d51ae5c83f7e6528d95877d29ed3069c4d5675ae52e4b9bb7141 |
File details
Details for the file ukr_itn-0.1.6-py3-none-any.whl
.
File metadata
- Download URL: ukr_itn-0.1.6-py3-none-any.whl
- Upload date:
- Size: 34.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7df2ba4764e39ed1db310fbe166017089c088f997f598520df0e1c46457b8b13 |
|
MD5 | 065395e37f8b42e4c66c1167ba44b182 |
|
BLAKE2b-256 | e6ea9812dde1cf223184eeb78b22a0ff724400d0037666e4537fc15923bcaebb |