Skip to main content

Convert a Chinese sentence to Pinyin or Jyutping

Project description

https://travis-ci.org/lucwastiaux/python-pinyin-jyutping-sentence.svg?branch=master

Python module which converts a Chinese sentence from Simplified/Traditional to Mandarin/Pinyin and Traditional/Simplified to Cantonese/Jyutping, outputting diacritics (accented characters). I designed this library to create Mandarin and Cantonese flashcards.

Want to support my work on this module ? Become a supporter: https://www.patreon.com/lucw

Install

$ pip install pinyin_jyutping_sentence

Usage

>>> import pinyin_jyutping_sentence
>>> pinyin_jyutping_sentence.pinyin("提高口语")
'tígāo kǒuyǔ'
>>> pinyin_jyutping_sentence.jyutping("我出去攞野食")
'ngǒ cēothêoi ló jěsik'
# the tone_numbers argument can be used to disable diacritics
>>> pinyin_jyutping_sentence.pinyin("忘拿一些东西了", tone_numbers=True)
'wang4 na2 yi1xie1 dong1xi5 le5'
# the spaces argument adds a space between each syllable
>>> pinyin_jyutping_sentence.pinyin("忘拿一些东西了", tone_numbers=True, spaces=True)
'wang4 na2 yi1 xie1 dong1 xi5 le5'
>>> pinyin_jyutping_sentence.jyutping("有啲好貴", tone_numbers=True)
'jau5 di1 hou3 gwai3'

REST API

You can use the REST API at the following URL:

http://api.prod.mandarincantonese.com/jyutping/我哋盪失咗
{"jyutping": "ngǒ déi dongsāt zó"}
http://api.prod.mandarincantonese.com/pinyin/办所有的事情
{"pinyin": "bàn suǒyǒu de shìqíng"}

# calling the API from python
import requests
import json

url = "http://api.prod.mandarincantonese.com/jyutping/我哋盪失咗"
response = requests.get(url)
print(json.loads(response.content)["jyutping"])
>>> ngǒ déi dongsāt 

Changelog

  • v0.8: embed MDBG CC-CEDICT for more accurate Pinyin conversions

  • v0.6: allow converting Traditional characters to Pinyin, and Simplified to Jyutping

Google Sheets add-on

This library is available in the form of a Google Sheets Add-on. You can read about it here: https://medium.com/@lucw/converting-chinese-characters-to-pinyin-or-jyutping-on-google-sheets-eb12cca669cb

How it works

Uses the Jieba library (https://github.com/fxsjy/jieba) to tokenize the sentence. Then words are converted to Pinyin/Jyutping either as a whole, or character by character, using the CC-Canto dictionary (http://cantonese.org/about.html). The Jyutping diacritic conversion is not standard but originally described here: http://www.cantonese.sheik.co.uk/phorum/read.php?1,127274,129006

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pinyin_jyutping_sentence-0.8.tar.gz (9.4 MB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page