query Unicode script metadata
Project description
Simple Python 3 module to query Unicode UCD script metadata (see UAX
#24).
| This module is useful for querying if a text is made of Latin
characters,
| Arabic, hiragana, kanji (han), and so on. It works for all scripts
supported
| by the Unicode character database.
| This module is dumb and slow. If you need speed, you probably want to
| implement your own functions.
Sample usage:
::
>>> import uniscripts
>>> uniscripts.is_script('A', 'Latin')
True
# if you pass it a string, all characters must match
>>> uniscripts.is_script('はるはあけぼの', 'Hiragana')
True
>>> uniscripts.is_script('はるはAkebono', 'Hiragana')
False
# ...but by default, it ignores 'Common' characters, such as punctuation.
>>> uniscripts.is_script('はるは:あけぼの', 'Hiragana')
True
>>> uniscripts.is_script('中華人民共和国', 'Han') # 'Han' = kanji or hànzì
True
>>> uniscripts.which_scripts('z')
['Latin']
>>> uniscripts.which_scripts('は')
['Hiragana']
>>> uniscripts.which_scripts('ー') # U+30FC
['Common', 'Katakana', 'Hiragana', 'Hangul', 'Han', 'Bopomofo', 'Yi']
See docstrings for ``is_script()``, ``which_scripts()``.
#24).
| This module is useful for querying if a text is made of Latin
characters,
| Arabic, hiragana, kanji (han), and so on. It works for all scripts
supported
| by the Unicode character database.
| This module is dumb and slow. If you need speed, you probably want to
| implement your own functions.
Sample usage:
::
>>> import uniscripts
>>> uniscripts.is_script('A', 'Latin')
True
# if you pass it a string, all characters must match
>>> uniscripts.is_script('はるはあけぼの', 'Hiragana')
True
>>> uniscripts.is_script('はるはAkebono', 'Hiragana')
False
# ...but by default, it ignores 'Common' characters, such as punctuation.
>>> uniscripts.is_script('はるは:あけぼの', 'Hiragana')
True
>>> uniscripts.is_script('中華人民共和国', 'Han') # 'Han' = kanji or hànzì
True
>>> uniscripts.which_scripts('z')
['Latin']
>>> uniscripts.which_scripts('は')
['Hiragana']
>>> uniscripts.which_scripts('ー') # U+30FC
['Common', 'Katakana', 'Hiragana', 'Hangul', 'Han', 'Bopomofo', 'Yi']
See docstrings for ``is_script()``, ``which_scripts()``.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
uniscripts-1.0.5.tar.gz
(16.1 kB
view hashes)