Skip to main content

MandarinTamer is a Python library for converting Mandarin text between Simplified Chinese and Traditional Chinese, with a focus on the Taiwanese variant. It's designed to be accurate, flexible, and easy to use.

Project description

MandarinTamer

MandarinTamer is a Python library for converting Mandarin text between Simplified Chinese and Traditional Chinese, with a focus on the Taiwanese variant. It's designed to be accurate, flexible, and easy to use.

What Makes MandarinTamer Unique?

MandarinTamer stands out for its ability to convert text without requiring prior knowledge of the input script. It seamlessly handles Simplified, all forms of Traditional, or even mixed-script text, automatically transforming it into your desired script.

Key Features

  • Simplified ↔ Taiwanese Traditional Conversion: Handle text transformation with precision, adhering to regional linguistic norms.
  • AI-Powered Context Awareness: Uses sentence context with AI to intelligently resolve one-to-many mappings.
  • Context-Free Accuracy: Achieves high accuracy without requiring metadata or prior knowledge of the input text.
  • Modernization and Normalization: Optionally replace rare or archaic words with more common used equivalents.
  • Open Source: Built for developers and researchers to adapt, enhance, and integrate into other projects.

Why MandarinTamer?

Traditional conversion tools often fail to capture the nuances of regional variants like Taiwanese Traditional Chinese or struggle with rare or outdated terms. MandarinTamer is designed to be a versatile tool for anyone in the Chinese linguistics field—whether you're a professor, translator, teacher, developer, or researcher—offering precision and flexibility for various applications, from localization to language education.

Get Started

Install MandarinTamer from PyPI:

pip install mandarin-tamer

Basic usage:

from mandarin_tamer import convert_mandarin_script

# Convert to Traditional (Taiwan)
trad = convert_mandarin_script("简体字", target_script="zh_tw")
print(trad)  # 簡體字

# Convert to Simplified
simp = convert_mandarin_script("繁體字", target_script="zh_cn")
print(simp)  # 繁体字

# Advanced options
text = convert_mandarin_script(
    "现代化的字",
    target_script="zh_tw",
    modernize=True,     # Replace archaic terms with modern ones
    normalize=True,     # Normalize character variants
    taiwanize=True,    # Use Taiwan-specific variants
    improved_one_to_many=True, # Use improved one-to-many mapping
    ner_list=["人名"], # List of NERs to include
    include_dicts={"name": ["name_dict.json"]}, # Include specific dictionaries
    exclude_lists={"name": ["name_exclude.json"]}, # Exclude specific dictionaries
)

For more examples and detailed documentation, visit our GitHub repository or PyPi page.

Original Developers

  • Jon Knebel (Virginia, USA) – Full stack engineer + language educator + independent researcher of linguistics and language learning psychology.
  • Valeriu Celmare (Romania) – Full stack engineer with a focus on Django and Python.

Contributors

The dictionaries powering MandarinTamer have been made highly accurate for the top 10,000 Mandarin words, thanks to the contributions of professional translators from Taiwan, Hong Kong, and Mainland China. Special thanks to the following individuals for their valuable work in curating and verifying the dictionaries that power the tool:

Taiwan:

  • Rita J. Lee (李佩蓉) – PhD in Chinese Literature; Taipei.
  • Jamie Chang (張汝禎) - Taipei
  • Hsin Fang Wu - Taipei
  • 潘依依 (Elsie) – Expert in modern and classical Mandarin; Taipei; https://pse.is/754xk3

Mainland China:

  • Zhou Yu

Hong Kong:

  • Julia Yuen Ka Suen (袁嘉旋)
  • Lok Yee Chan

Their dedication and expertise have been crucial in ensuring the accuracy and reliability of MandarinTamer.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mandarin_tamer-0.0.11.tar.gz (417.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mandarin_tamer-0.0.11-py3-none-any.whl (422.3 kB view details)

Uploaded Python 3

File details

Details for the file mandarin_tamer-0.0.11.tar.gz.

File metadata

  • Download URL: mandarin_tamer-0.0.11.tar.gz
  • Upload date:
  • Size: 417.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for mandarin_tamer-0.0.11.tar.gz
Algorithm Hash digest
SHA256 e2951b96984017fe311c5a1f4d6eb71cfaeb31ccd21881acb1069b0a7facae60
MD5 00ff08831b85af0678bfb45260e00c99
BLAKE2b-256 5974b45cc45e9c7796e579efb4360c289eb35d0b4b302719ba351dcf53af0f13

See more details on using hashes here.

Provenance

The following attestation bundles were made for mandarin_tamer-0.0.11.tar.gz:

Publisher: release.yml on creolio/mandarinTamer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mandarin_tamer-0.0.11-py3-none-any.whl.

File metadata

  • Download URL: mandarin_tamer-0.0.11-py3-none-any.whl
  • Upload date:
  • Size: 422.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for mandarin_tamer-0.0.11-py3-none-any.whl
Algorithm Hash digest
SHA256 522fedf0dd6b59102e7b06cdfe9fcaddcc200af9469fb0742e07f41681326000
MD5 e48f59c69cae063d8a8e6ea5a778a825
BLAKE2b-256 e44846776e9731df9921e4d5002b940c8019d7df20facbacb1f8d08eb535bff7

See more details on using hashes here.

Provenance

The following attestation bundles were made for mandarin_tamer-0.0.11-py3-none-any.whl:

Publisher: release.yml on creolio/mandarinTamer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page