Skip to main content

WFST for Ukrainian Inverse Text Normalization (ITN) based on NVIDIA NeMo and Pynini

Project description

WFST for Ukrainian ITN

Simple WFST for Ukrainian ITN based on NVIDIA NeMo and Pynini

Installation

pip install ukr-itn

Usage

from ukr.wfst import graph, apply_fst_text

apply_fst_text("це трапилося дві тисячі дев'ятнадцятого числа", graph)  # це трапилося 2019 числа
apply_fst_text("мінус пять цілих одна десята відсотка", graph)  # -5.1 %
apply_fst_text("двадцять дві тисячі сто один", graph)  # 22101

From command line

echo "це трапилося дві тисячі дев'ятнадцятого числа" | python -m ukr

Will return це трапилося 2019-го числа

JSON output

For more advanced usage you can get json output

from ukr.wfst import json_graph, apply_fst_text

apply_fst_text("це трапилося дві тисячі дев'ятнадцятого числа", json_graph)
# >>> '[{"word": "це"}, {"word": "трапилося"}, {"ordinal": "2019"}, {"word": "числа"}]' 

How it works

We have two king of FST: taggers and verbalizers

This is a tagger:

from ukr.wfst import classifyFst, apply_fst_text

apply_fst_text("мінус пять цілих одна десята відсотка", classifyFst.fst)  

will return "measure { decimal { negative: "true" integer_part: "5" fractional_part: "1" } units: "%" }"

And this is a verbalizers

from ukr.wfst import verbalizeFinalFst, apply_fst_text

apply_fst_text('measure { decimal { negative: "true" integer_part: "5" fractional_part: "1" } units: "%" }', verbalizeFinalFst.fst)  

will return -5.1 %

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ukr_itn-0.1.7.tar.gz (20.3 kB view details)

Uploaded Source

Built Distribution

ukr_itn-0.1.7-py3-none-any.whl (36.5 kB view details)

Uploaded Python 3

File details

Details for the file ukr_itn-0.1.7.tar.gz.

File metadata

  • Download URL: ukr_itn-0.1.7.tar.gz
  • Upload date:
  • Size: 20.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.13

File hashes

Hashes for ukr_itn-0.1.7.tar.gz
Algorithm Hash digest
SHA256 e16dfb1fdc1c8aade743783042b9ca441a52f4d4626eeed08d1354e5574282ed
MD5 5870aea93c04cc2a611697ba3e22abe2
BLAKE2b-256 d63fa96bcc51a47ecd212b438ba416887079385e7a5f132b990ed5f6fc0c4d74

See more details on using hashes here.

File details

Details for the file ukr_itn-0.1.7-py3-none-any.whl.

File metadata

  • Download URL: ukr_itn-0.1.7-py3-none-any.whl
  • Upload date:
  • Size: 36.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.13

File hashes

Hashes for ukr_itn-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 4f09b286a188121092a779e186b2b157f1ab71b055a7cbc37069f23c7a0e7a78
MD5 2d575800ba07ca1bec36513f9e884f53
BLAKE2b-256 7c0f51862740142bc3b7e2ddcfe22066d0daeb1b5b66d62dc6fba9abd4bdaabb

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page