A simple, deterministic, and extensible approach to inverse text normalization for numbers
Project description
A simple, deterministic, and extensible approach to inverse text normalization (ITN) for numbers.
Overview
This package converts raw spoken-form text (speech recognition output) into user-friendly written-form text. It works best for converting spoken numbers into numerical digits, or other translation tasks that do not modify word ordering. A csv file is provided to define the basic rules for transforming spoken tokens into written tokens, and extra pre/post-processing may be applied for more specific formatting requirements, i.e. dates, measurements, money, etc.
These examples were produced by running this script.
Installation
This package supports Python versions >= 3.7
To install from PyPI:
pip install itnpy2
To install locally:
pip install -e .
Tests
To run tests, use pytest in the root folder of this repository:
pytest
Issues
This package has been verified on a limited set of test-cases. For any translation mistakes, feel free to open a pull request and update failing.csv with the input, expected output, and mistake; thanks!
Citation
If you find this work useful, please consider citing it.
@misc{hsu2022itn,
title = {A simple, deterministic, and extensible approach to inverse text normalization for numbers},
author = {Brandhsu},
howpublished = {https://github.com/barseghyanartur/itnpy},
year = {2022}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.