WeTextProcessing Runtime

These details have not been verified by PyPI

Project links

Project description

WeTextProcessing Runtime

Python runtime for WeTextProcessing (does not depend on Pynini).

WeTextProcessing is a text processing library that provides text normalization (TN) and inverse text normalization (ITN) capabilities for Chinese, English and Japanese text. It uses Finite State Transducers (FSTs) for efficient text processing.

Features

Text Normalization (TN) for Chinese, English and Japanese
Inverse Text Normalization (ITN) for Chinese, English and Japanese
Traditional to Simplified Chinese conversion
Full-width to Half-width character conversion
Interjection removal
Punctuation removal
Out-of-vocabulary (OOV) word tagging
Erhua removal (for Chinese)
0-to-9 conversion (for Chinese and Japanese ITN)

Installation

pip install wetext

Usage

Python API

Text Normalization (TN)

from wetext import Normalizer

# Chinese TN with erhua removal
normalizer = Normalizer(lang="zh", operator="tn", remove_erhua=True)
result = normalizer.normalize("你好 WeTextProcessing 1.0，全新版本儿，简直666")
print(result)  # 你好 WeTextProcessing 一点零，全新版本，简直六六六

# English TN
normalizer = Normalizer(lang="en", operator="tn")
result = normalizer.normalize("The price is $12.50, please pay now.")
print(result)  # The price is twelve point five dollars, please pay now.

Inverse Text Normalization (ITN)

from wetext import Normalizer

# Chinese ITN
normalizer = Normalizer(lang="zh", operator="itn", enable_0_to_9=False)
result = normalizer.normalize("你好 WeTextProcessing 一点零，全新版本儿，简直六六六，九和六")
print(result)  # 你好 WeTextProcessing 1.0，全新版本儿，简直666，九和六

# English ITN
normalizer = Normalizer(lang="en", operator="itn")
result = normalizer.normalize("twenty three dollars and fifty cents")
print(result)  # $23.50

Command Line Interface

# Basic usage
wetext "你好 WeTextProcessing 1.0，全新版本儿，简直666"

# With options
wetext --lang zh --operator tn --remove-erhua "你好 WeTextProcessing 1.0，全新版本儿，简直666"

# Convert traditional to simplified Chinese
wetext --traditional-to-simple "你好，這是測試。"

# Remove punctuations
wetext --remove-puncts "你好，這是測試。"

API Reference

Normalizer Class

Normalizer(
    lang: Literal["auto", "en", "zh", "ja"] = "auto",
    operator: Literal["tn", "itn"] = "tn",
    traditional_to_simple: bool = False,
    full_to_half: bool = False,
    remove_interjections: bool = False,
    remove_puncts: bool = False,
    tag_oov: bool = False,
    enable_0_to_9: bool = False,
    remove_erhua: bool = False,
)

Parameters

lang: The language of the text. Can be "auto", "en", "zh" or "ja". Default is "auto".
operator: The operator to use. Can be "tn" (text normalization) or "itn" (inverse text normalization). Default is "tn".
traditional_to_simple: Whether to convert traditional Chinese to simplified Chinese. Default is False.
full_to_half: Whether to convert full-width characters to half-width characters. Default is False.
remove_interjections: Whether to remove interjections. Default is False.
remove_puncts: Whether to remove punctuation. Default is False.
tag_oov: Whether to tag out-of-vocabulary words. Default is False.
enable_0_to_9: Whether to enable 0-to-9 conversion for ITN. Default is False.
remove_erhua: Whether to remove erhua for TN. Default is False.

Methods

normalize(text: str, lang: Optional[Literal["auto", "en", "zh", "ja"]] = None) -> str: Normalize the text.

CLI Options

--lang, -l: Set the language. Choices are "auto", "en", "zh", "ja". Default is "auto".
--operator, -o: Set the operator. Choices are "tn", "itn". Default is "tn".
--traditional-to-simple: Convert traditional Chinese to simplified Chinese.
--full-to-half: Convert full-width characters to half-width characters.
--remove-interjections: Remove interjections.
--remove-puncts: Remove punctuation.
--tag-oov: Tag out-of-vocabulary words.
--enable-0-to-9: Enable 0-to-9 conversion.
--remove-erhua: Remove erhua.

License

Apache License 2.0

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.4

Jun 10, 2026

0.1.3

Jun 10, 2026

0.1.2

Nov 28, 2025

0.1.1

Nov 28, 2025

0.1.0

Sep 8, 2025

0.0.9

Aug 21, 2025

0.0.8

Jul 22, 2025

0.0.7

Jul 20, 2025

0.0.4

Mar 26, 2025

0.0.3

Jan 7, 2025

0.0.2

Jan 7, 2025

0.0.1

Jul 7, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wetext-0.1.4.tar.gz (1.8 MB view details)

Uploaded Jun 10, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

wetext-0.1.4-py3-none-any.whl (1.9 MB view details)

Uploaded Jun 10, 2026 Python 3

File details

Details for the file wetext-0.1.4.tar.gz.

File metadata

Download URL: wetext-0.1.4.tar.gz
Upload date: Jun 10, 2026
Size: 1.8 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for wetext-0.1.4.tar.gz
Algorithm	Hash digest
SHA256	`2e5ff3b323b3cb67b207ae196a94c998a21afbfc793b8f145befc56bbd8f2d8f`
MD5	`2c1af0442eea5fce3ac2410414150950`
BLAKE2b-256	`660dc34089120586d0727a845d83d9eb45bd8b39fe5f01ee1a3376d54c4413ef`

See more details on using hashes here.

File details

Details for the file wetext-0.1.4-py3-none-any.whl.

File metadata

Download URL: wetext-0.1.4-py3-none-any.whl
Upload date: Jun 10, 2026
Size: 1.9 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for wetext-0.1.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`eb33daea87d69aa366fb541dd5427b90b03346614ba29ea8ed5af54a7a377045`
MD5	`b9f1a3e9beafd8c7141f1321f1c79992`
BLAKE2b-256	`053dc0c8cec32b2d44a7b94e5da3ee5e830ebda6e692db5b98d6048d81b40191`

See more details on using hashes here.

wetext 0.1.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

WeTextProcessing Runtime

Features

Installation

Usage

Python API

Text Normalization (TN)

Inverse Text Normalization (ITN)

Command Line Interface

API Reference

Normalizer Class

Parameters

Methods

CLI Options

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes