MandarinTamer is a Python library for converting Mandarin text between Simplified Chinese and Traditional Chinese, with a focus on the Taiwanese variant. It's designed to be accurate, flexible, and easy to use.
Project description
MandarinTamer
MandarinTamer is a Python library for converting Mandarin text between Simplified Chinese and Traditional Chinese, with a focus on the Taiwanese variant. It's designed to be accurate, flexible, and easy to use.
What Makes MandarinTamer Unique?
MandarinTamer stands out for its ability to convert text without requiring prior knowledge of the input script. It seamlessly handles Simplified, all forms of Traditional, or even mixed-script text, automatically transforming it into your desired script.
Key Features
- Simplified ↔ Taiwanese Traditional Conversion: Handle text transformation with precision, adhering to regional linguistic norms.
- AI-Powered Context Awareness: Uses sentence context with AI to intelligently resolve one-to-many mappings.
- Context-Free Accuracy: Achieves high accuracy without requiring metadata or prior knowledge of the input text.
- Modernization and Normalization: Optionally replace rare or archaic words with more common used equivalents.
- Open Source: Built for developers and researchers to adapt, enhance, and integrate into other projects.
Why MandarinTamer?
Traditional conversion tools often fail to capture the nuances of regional variants like Taiwanese Traditional Chinese or struggle with rare or outdated terms. MandarinTamer is designed to be a versatile tool for anyone in the Chinese linguistics field—whether you're a professor, translator, teacher, developer, or researcher—offering precision and flexibility for various applications, from localization to language education.
Get Started
Install MandarinTamer from PyPI:
pip install mandarin-tamer
Basic usage:
from mandarin_tamer import convert_mandarin_script
# Convert to Traditional (Taiwan)
trad = convert_mandarin_script("简体字", target_script="zh_tw")
print(trad) # 簡體字
# Convert to Simplified
simp = convert_mandarin_script("繁體字", target_script="zh_cn")
print(simp) # 繁体字
# Advanced options
text = convert_mandarin_script(
"现代化的字",
target_script="zh_tw",
modernize=True, # Replace archaic terms with modern ones
normalize=True, # Normalize character variants
taiwanize=True, # Use Taiwan-specific variants
improved_one_to_many=True, # Use improved one-to-many mapping
ner_list=["人名"], # List of NERs to include
include_dicts={"name": ["name_dict.json"]}, # Include specific dictionaries
exclude_lists={"name": ["name_exclude.json"]}, # Exclude specific dictionaries
)
For more examples and detailed documentation, visit our GitHub repository or PyPi page.
Original Developers
- Jon Knebel (Virginia, USA) – Full stack engineer + language educator + independent researcher of linguistics and language learning psychology.
- Valeriu Celmare (Romania) – Full stack engineer with a focus on Django and Python.
Contributors
The dictionaries powering MandarinTamer have been made highly accurate for the top 10,000 Mandarin words, thanks to the contributions of professional translators from Taiwan, Hong Kong, and Mainland China. Special thanks to the following individuals for their valuable work in curating and verifying the dictionaries that power the tool:
Taiwan:
- Rita J. Lee (李佩蓉) – PhD in Chinese Literature; Taipei.
- Jamie Chang (張汝禎) - Taipei
- Hsin Fang Wu - Taipei
- 潘依依 (Elsie) – Expert in modern and classical Mandarin; Taipei; https://pse.is/754xk3
Mainland China:
- Zhou Yu
Hong Kong:
- Julia Yuen Ka Suen (袁嘉旋)
- Lok Yee Chan
Their dedication and expertise have been crucial in ensuring the accuracy and reliability of MandarinTamer.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mandarin_tamer-0.0.11.tar.gz.
File metadata
- Download URL: mandarin_tamer-0.0.11.tar.gz
- Upload date:
- Size: 417.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e2951b96984017fe311c5a1f4d6eb71cfaeb31ccd21881acb1069b0a7facae60
|
|
| MD5 |
00ff08831b85af0678bfb45260e00c99
|
|
| BLAKE2b-256 |
5974b45cc45e9c7796e579efb4360c289eb35d0b4b302719ba351dcf53af0f13
|
Provenance
The following attestation bundles were made for mandarin_tamer-0.0.11.tar.gz:
Publisher:
release.yml on creolio/mandarinTamer
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mandarin_tamer-0.0.11.tar.gz -
Subject digest:
e2951b96984017fe311c5a1f4d6eb71cfaeb31ccd21881acb1069b0a7facae60 - Sigstore transparency entry: 174202388
- Sigstore integration time:
-
Permalink:
creolio/mandarinTamer@32347887ece447886c3853551684315e8cf10d6c -
Branch / Tag:
refs/heads/main - Owner: https://github.com/creolio
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@32347887ece447886c3853551684315e8cf10d6c -
Trigger Event:
push
-
Statement type:
File details
Details for the file mandarin_tamer-0.0.11-py3-none-any.whl.
File metadata
- Download URL: mandarin_tamer-0.0.11-py3-none-any.whl
- Upload date:
- Size: 422.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
522fedf0dd6b59102e7b06cdfe9fcaddcc200af9469fb0742e07f41681326000
|
|
| MD5 |
e48f59c69cae063d8a8e6ea5a778a825
|
|
| BLAKE2b-256 |
e44846776e9731df9921e4d5002b940c8019d7df20facbacb1f8d08eb535bff7
|
Provenance
The following attestation bundles were made for mandarin_tamer-0.0.11-py3-none-any.whl:
Publisher:
release.yml on creolio/mandarinTamer
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mandarin_tamer-0.0.11-py3-none-any.whl -
Subject digest:
522fedf0dd6b59102e7b06cdfe9fcaddcc200af9469fb0742e07f41681326000 - Sigstore transparency entry: 174202393
- Sigstore integration time:
-
Permalink:
creolio/mandarinTamer@32347887ece447886c3853551684315e8cf10d6c -
Branch / Tag:
refs/heads/main - Owner: https://github.com/creolio
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@32347887ece447886c3853551684315e8cf10d6c -
Trigger Event:
push
-
Statement type: