A Python library for quantitative tasks in Chinese historical linguistics.

These details have not been verified by PyPI

Project links

Homepage

Project description

SinoPy: Python Library for quantitative tasks in Chinese historical linguistics

PyPI

SinoPy is an attempt to provide useful functionality for users working with Chinese dialects and Sino-Tibetan language data and struggling with tasks like converting characters to Pinyin, analysing characters, or analysing readings in Chinese dialects and other SEA languages.

If you use the library in your research, please quote it as:

List, Johann-Mattis (2018): SinoPy: Python Library for quantitative tasks in Chinese historical linguistics. Version 0.3.0. Jena: Max Planck Institute for the Science of Human History. DOI: https://zenodo.org/badge/latestdoi/30593438

This is intended as a plugin for LingPy, or an addon. The library gives utility functions that prove useful to handle Chinese data in a very broad context, ranging from Chinese character readings up to proposed readings in Middle Chinese and older stages of the language.

Quick Usage Examples

Convert Baxter's (1992) Middle Chinese transcription system to plain IPA (with tone marks).

>>> from sinopy import baxter2ipa
>>> baxter2ipa('bjang')
'bjaŋ¹'
>>> baxter2ipa('bjang', segmented=True)
['b', 'j', 'a', 'ŋ', '¹']

Convert Chinese characters to Pīnyīn

>>> from sinopy import pinyin
>>> pinyin('我', variant='cantonese')
'ngo5'
>>> pinyin('我', variant='mandarin')
'wǒ'

Try to find character by combining two characters:

>>> from sinopy import character_from_structure
>>> character_from_structure('+人我')
'俄'

More examples

At the moment, you may have difficulties finding a common idea behind SinoPy, as the collection of scripts is very diverse. The general topic, however, are basic operations one frequently encounters when working with Chinese and SEA linguistic data.

But let's just look at a couple of examples:

>>> from sinopy import *
>>> char = "我"
>>> pinyin(char, variant="mandarin")
wǒ

So obviously, we can convert characters to Pīnyīn.

>>> is_chinese(char)
True
>>> is_chinese('b')
False

So the library also checks if a character belongs to Chinese Unicode range.

But we have also a range of functions for handling Middle Chinese and related problems. For example the following:

>>> parse_baxter('ngaH')
('ng', '', 'a', 'H')

So this function will read in a Middle Chinese string (as encoded in the system of Baxter 1992) and return its main constituents (initial, medial, final, and tone).

But we can also directly convert a character to its Middle Chinese reading:

>>> chars2baxter(char)
['ngaX']

Or we can retrieve a basic gloss.

>>> chars2gloss(char)
['our, us, i, me, my, we']

A rather complex function is the sixtuple2baxter function, which reads in the classical six-character descriptions of the Middle Chinese reading of a given character and yields the Middle Chinese value following Baxter's system. You find a lot of sixtuple readings in the DOC database (published with the Tower of Babel project).

>>> sixtuple2baxter('蟹開一上海泥')                            
['n', '', 'oj', 'X']
>>> chars2baxter('乃')                 
['nojX']

You can also directly try to retrieve the MC reading from passing two fǎnqiè characters, for example:

>>> fanqie2mch('海泥')
'xej'
>>> fanqie2mch('泥海')
'nojX'

And if you don't like Baxter's MCH transcriptions, you can simply turn it to IPA:

>>> baxter2ipa('nojX')
noj²
>>> baxter2ipa('tsyang')
'ʨaŋ¹'

As a final important function, consider the parser for morphemes:

>>> parse_chinese_morphemes('ʨaŋ¹')
['ʨ', '-', 'a', 'ŋ', '¹']

The quintuple that he method returns splits the sequence into its five main constituents, initial, medial, nucleus, coda, and tone.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.3.4

Dec 5, 2019

0.3.3

Nov 26, 2018

0.3.2

Nov 25, 2018

0.3.1

Aug 24, 2018

0.3.0

Aug 24, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sinopy-0.3.4.tar.gz (11.7 MB view details)

Uploaded Dec 5, 2019 Source

File details

Details for the file sinopy-0.3.4.tar.gz.

File metadata

Download URL: sinopy-0.3.4.tar.gz
Upload date: Dec 5, 2019
Size: 11.7 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.20.0 setuptools/40.2.0 requests-toolbelt/0.8.0 tqdm/4.25.0 CPython/3.5.2

File hashes

Hashes for sinopy-0.3.4.tar.gz
Algorithm	Hash digest
SHA256	`f0f6d2d05cb739900e534b191c8262631b3fcd628de72914f9e93ff06ce37e8d`
MD5	`4679dbfbcb92abcee356d7c11232e0bd`
BLAKE2b-256	`fb4b059327435613b3e8f535ecd06609413b25c3b772ac2a8a622df2a8cb5536`

See more details on using hashes here.

sinopy 0.3.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

SinoPy: Python Library for quantitative tasks in Chinese historical linguistics

Quick Usage Examples

More examples

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes