Skip to main content

Zhon provides constants used in Chinese text processing.

Project description

Zhon is a Python library that provides constants commonly used in Chinese text processing.


Zhon’s constants can be used in Chinese text processing, for example:

  • Find CJK characters in a string:

    >>> re.findall('[%s]' % zhon.hanzi.characters, 'I broke a plate: 我打破了一个盘子.')
    ['我', '打', '破', '了', '一', '个', '盘', '子']
  • Validate Pinyin syllables, words, or sentences:

    >>> re.findall(zhon.pinyin.syllable, 'Yuànzi lǐ tíngzhe yí liàng chē.', re.I)
    ['Yuàn', 'zi', 'lǐ', 'tíng', 'zhe', 'yí', 'liàng', 'chē']
    >>> re.findall(zhon.pinyin.word, 'Yuànzi lǐ tíngzhe yí liàng chē.', re.I)
    ['Yuànzi', 'lǐ', 'tíngzhe', 'yí', 'liàng', 'chē']
    >>> re.findall(zhon.pinyin.sentence, 'Yuànzi lǐ tíngzhe yí liàng chē.', re.I)
    ['Yuànzi lǐ tíngzhe yí liàng chē.']


  • Includes commonly-used constants:
    • CJK characters and radicals

    • Chinese punctuation marks

    • Chinese sentence regular expression pattern

    • Pinyin vowels, consonants, lowercase, uppercase, and punctuation

    • Pinyin syllable, word, and sentence regular expression patterns

    • Zhuyin characters and marks

    • Zhuyin syllable regular expression pattern

    • CC-CEDICT characters

  • Runs on Python 2.7 and 3

Getting Started

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zhon-1.1.5.tar.gz (99.8 kB view hashes)

Uploaded source

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Fastly Fastly CDN Google Google Object Storage and Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page