WeTextProcessing, including TN & ITN
Project description
Text Normalization & Inverse Text Normalization
1. How To Use
1.1 pip install
pip install WeTextProcessing
# tn
from tn.chinese.normalizer import Normalizer
normalizer = Normalizer()
normalizer.normalize("2.5平方电线")
# itn
from itn.chinese.inverse_normalizer import InverseNormalizer
invnormalizer = InverseNormalizer()
invnormalizer.normalize("二点五平方电线")
1.2 source code compilation
git clone https://github.com/wenet-e2e/WeTextProcessing.git
cd WeTextProcessing
python normalize.py --text "2.5平方电线"
python inverse_normalize.py --text "二点五平方电线"
2. TN Pipeline
Please refer to TN.README
3. ITN Pipeline
Please refer to ITN.README
Acknowledge
- Thank the authors of foundational libraries like OpenFst & Pynini.
- Thank NeMo team & NeMo open-source community.
- Thank Zhenxiang Ma, Jiayu Du, and SpeechColab organization.
- Referred Pynini for reading the FAR, and printing the shortest path of a lattice in the C++ runtime.
- Referred TN of NeMo for the data to build the tagger graph.
- Referred ITN of chinese_text_normalization for the data to build the tagger graph.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
WeTextProcessing-0.0.3.tar.gz
(1.2 MB
view hashes)
Built Distribution
Close
Hashes for WeTextProcessing-0.0.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 39986bead86dae0e63529dd95a0b810ca5bb8118e11d724ed0f8b2c66647de9b |
|
MD5 | df358c0e0778ee41ffd01631ff965112 |
|
BLAKE2b-256 | cb5946691312b5cc29c3ec784d9131fc874f2328c47d12cb5408e3e0b7107a50 |