Polish FST Inverse Text Normalization
Project description
pl_itn
Inverse Text Normalization is an NLP task of changing the spoken form of a phrase to written form, for example:
one two three -> 1 2 3
pl_itn
is an opensource Polish ITN Python library and REST API for practical applications.
This project is an implementation of NeMo Inverse Text Normalization for Polish.
Table of contents
Prerequisites
Setup
Usage
Documentation
Contributing
License
References
Prerequisites
For pynini
- A standards-compliant C++17 compiler (GCC >= 7 or Clang >= 700)
- The compatible recent version of OpenFst built with the grm extensions (see
deps/install_openfst.md
)
Setup
Make sure to first install prerequisites, especially OpenFST.
Install from PyPI
pip install pl_itn
Build from source
pip install .
Editable install for development
pip install -e .[dev]
Usage
Console app
usage: pl_itn [-h] (-t TEXT | -i) [--tagger TAGGER] [--verbalizer VERBALIZER] [--config CONFIG]
[--log_level {debug,info}] [-d]
Inverse Text Normalization based on Finite State Transducers
options:
-h, --help show this help message and exit
-t TEXT, --text TEXT Input text
-i, --interactive If used, demo will process phrases from stdin interactively.
--tagger TAGGER
--verbalizer VERBALIZER
--config CONFIG Optionally provide yaml config with tagger and verbalizer paths.
--log_level {debug,info}
-d, --debug_mode If used, process will be interrupted on runtime errors, else it will
return a step back value.
pl_itn -t "jest za pięć druga"
jest 01:55
pl_itn -t "drugi listopada dwa tysiące osiemnastego roku"
2 listopada 2018 roku
Python
>>> from pl_itn import Normalizer
>>> normalizer = Normalizer()
>>> normalizer.normalize("za pięć dwunasta")
'11:55'
Documentation
Contributing
License
Rerences
- K. Gorman. 2016. Pynini: A Python library for weighted finite-state grammar compilation. In Proc. ACL Workshop on Statistical NLP and Weighted Automata, 75-80.
- Y. Zhang, E. Bakhturina, K. Gorman, and B. Ginsburg. 2021. NeMo Inverse Text Normalization: From Development To Production.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pl_itn-0.1.0rc0.tar.gz
(204.2 kB
view hashes)
Built Distribution
pl_itn-0.1.0rc0-py3-none-any.whl
(210.8 kB
view hashes)
Close
Hashes for pl_itn-0.1.0rc0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c1931d08bd12c1921a089aa85f093c515de80e625ee5a5318f0a7c0fb23d1fdd |
|
MD5 | 4dd3d061a015ab5222aa6523ff314faf |
|
BLAKE2b-256 | f110c9b6be04790cec056f369b5a3e06fa701a6d30c8c4437233a506fa41e656 |