Skip to main content

Chinese Text Normalization(for speech recognition and text to speech)

Project description

zh-normalization

Chinese sentence NSW(Non-Standard-Word) Normalization

Supported NSW (Non-Standard-Word) Normalization

NSW type raw normalized
serial number 电影中梁朝伟扮演的陈永仁的编号27149 电影中梁朝伟扮演的陈永仁的编号二七一四九
cardinal 这块黄金重达324.75克
我们班的最高总分为583分
这块黄金重达三百二十四点七五克
我们班的最高总分为五百八十三分
numeric range 12~23
-1.5~2
十二到二十三
负一点五到二
date 她出生于86年8月18日,她弟弟出生于1995年3月1日 她出生于八六年八月十八日, 她弟弟出生于一九九五年三月一日
time 等会请在12:05请通知我 等会请在十二点零五分请通知我
temperature 今天的最低气温达到-10°C 今天的最低气温达到零下十度
fraction 现场有7/12的观众投出了赞成票 现场有十二分之七的观众投出了赞成票
percentage 明天有62%的概率降雨 明天有百分之六十二的概率降雨
money 随便来几个价格12块5,34.5元,20.1万 随便来几个价格十二块五,三十四点五元,二十点一万
telephone 这是固话0421-33441122
这是手机+86 18544139121
这是固话零四二一三三四四一一二二
这是手机八六一八五四四一三九一二一

Usage

pip install zh-normalization

Run the following code to normalize the Chinese sentence:

from zh_normalization import TextNormalizer

m = TextNormalizer()
text = "电影中梁朝伟扮演的陈永仁的编号27149!"
sents = m.normalize(text)
new_text = ''.join(sents)
print(new_text)

Output:

电影中梁朝伟扮演的陈永仁的编号二七幺四九!

References

Pull requests #658 of DeepSpeech

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zh_normalization-0.0.2.tar.gz (51.1 kB view details)

Uploaded Source

File details

Details for the file zh_normalization-0.0.2.tar.gz.

File metadata

  • Download URL: zh_normalization-0.0.2.tar.gz
  • Upload date:
  • Size: 51.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.5

File hashes

Hashes for zh_normalization-0.0.2.tar.gz
Algorithm Hash digest
SHA256 32929a18c1a93e2df1bc624aef3e5143be3d9b1464f12c1e6dedaee0c5392dc3
MD5 074b91d80a0da8ecc2a96c951a6efe9b
BLAKE2b-256 44561cadfba8203d9502d578d2f0f9d26cd9d177b751157ffd88ce1a9c6b0d94

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page