Skip to main content

WFST for Ukrainian Inverse Text Normalization (ITN) based on NVIDIA NeMo and Pynini

Project description

WFST for Ukrainian ITN

Simple WFST for Ukrainian ITN based on NVIDIA NeMo and Pynini

Installation

pip install ukr-itn

Usage

from ukr.wfst import normalize

normalize("це трапилося дві тисячі дев'ятнадцятого числа")  # це трапилося 2019 числа
normalize("мінус пять цілих одна десята відсотка")  # -5.1 %
normalize("двадцять дві тисячі сто один")  # 22101

From command line

echo "це трапилося дві тисячі дев'ятнадцятого числа" | python -m ukr
Options:
  -h, --help     Show this help message and exit
  -j, --json     Return result as JSON
  -v, --verbose  Print original input and normalized to compare

Will return це трапилося 2019-го числа

JSON output

For more advanced usage you can get json output

from ukr.wfst import normalize

normalize("це трапилося дві тисячі дев'ятнадцятого числа", json=True)
# >>> '[{"word": "це"}, {"word": "трапилося"}, {"ordinal": "2019"}, {"word": "числа"}]' 

How it works

We have two king of FST: taggers and verbalizers

This is a tagger:

from ukr.wfst import classifyFst, apply_fst_text

apply_fst_text("мінус пять цілих одна десята відсотка", classifyFst.fst)  

will return "measure { decimal { negative: "true" integer_part: "5" fractional_part: "1" } units: "%" }"

And this is a verbalizers

from ukr.wfst import verbalizeFinalFst, apply_fst_text

apply_fst_text('measure { decimal { negative: "true" integer_part: "5" fractional_part: "1" } units: "%" }', verbalizeFinalFst.fst)  

will return -5.1 %

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ukr_itn-0.1.8.tar.gz (25.3 kB view details)

Uploaded Source

Built Distribution

ukr_itn-0.1.8-py3-none-any.whl (40.3 kB view details)

Uploaded Python 3

File details

Details for the file ukr_itn-0.1.8.tar.gz.

File metadata

  • Download URL: ukr_itn-0.1.8.tar.gz
  • Upload date:
  • Size: 25.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.13

File hashes

Hashes for ukr_itn-0.1.8.tar.gz
Algorithm Hash digest
SHA256 ed73f2b0743df89df868f31ab49b9db46815d7ebd57eeccd114c5ade5745acb0
MD5 03b4412e0d73a6912c5c838a316ab6f8
BLAKE2b-256 a7c4385bd35cadd6f869ad56c7c8f61b9cf8eafa5efbb385906ad545cff44995

See more details on using hashes here.

File details

Details for the file ukr_itn-0.1.8-py3-none-any.whl.

File metadata

  • Download URL: ukr_itn-0.1.8-py3-none-any.whl
  • Upload date:
  • Size: 40.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.13

File hashes

Hashes for ukr_itn-0.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 e2d2fe65acbb9e337f3e514e70f2ec43831ecfdd070c01757fe7fb798718207f
MD5 2eeca15a6c96ac7057dc287aa32aa4f7
BLAKE2b-256 92e6acf76ecca6b4b5b80dbb083777e6fd2582c34e5ae34c5c8ce9328ae5df92

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page