query Unicode script metadata
Project description
Simple Python 3 module to query Unicode UCD script metadata (see UAX
#24).
| This module is useful for querying if a text is made of Latin
characters,
| Arabic, hiragana, kanji (han), and so on. It works for all scripts
supported
| by the Unicode character database.
| This module is dumb and slow. If you need speed, you probably want to
| implement your own functions.
Sample usage:
::
>>> import uniscripts
>>> uniscripts.is_script('A', 'Latin')
True
# if you pass it a string, all characters must match
>>> uniscripts.is_script('はるはあけぼの', 'Hiragana')
True
>>> uniscripts.is_script('はるはAkebono', 'Hiragana')
False
# ...but by default, it ignores 'Common' characters, such as punctuation.
>>> uniscripts.is_script('はるは:あけぼの', 'Hiragana')
True
>>> uniscripts.is_script('中華人民共和国', 'Han') # 'Han' = kanji or hànzì
True
>>> uniscripts.which_scripts('z')
['Latin']
>>> uniscripts.which_scripts('は')
['Hiragana']
>>> uniscripts.which_scripts('ー') # U+30FC
['Common', 'Katakana', 'Hiragana', 'Hangul', 'Han', 'Bopomofo', 'Yi']
See docstrings for ``is_script()``, ``which_scripts()``.
#24).
| This module is useful for querying if a text is made of Latin
characters,
| Arabic, hiragana, kanji (han), and so on. It works for all scripts
supported
| by the Unicode character database.
| This module is dumb and slow. If you need speed, you probably want to
| implement your own functions.
Sample usage:
::
>>> import uniscripts
>>> uniscripts.is_script('A', 'Latin')
True
# if you pass it a string, all characters must match
>>> uniscripts.is_script('はるはあけぼの', 'Hiragana')
True
>>> uniscripts.is_script('はるはAkebono', 'Hiragana')
False
# ...but by default, it ignores 'Common' characters, such as punctuation.
>>> uniscripts.is_script('はるは:あけぼの', 'Hiragana')
True
>>> uniscripts.is_script('中華人民共和国', 'Han') # 'Han' = kanji or hànzì
True
>>> uniscripts.which_scripts('z')
['Latin']
>>> uniscripts.which_scripts('は')
['Hiragana']
>>> uniscripts.which_scripts('ー') # U+30FC
['Common', 'Katakana', 'Hiragana', 'Hangul', 'Han', 'Bopomofo', 'Yi']
See docstrings for ``is_script()``, ``which_scripts()``.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
uniscripts-1.0.5.tar.gz
(16.1 kB
view details)
File details
Details for the file uniscripts-1.0.5.tar.gz
.
File metadata
- Download URL: uniscripts-1.0.5.tar.gz
- Upload date:
- Size: 16.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ac2e3fab0db570466958eb759d4143b67cae12766484ea98aef8911b31d2e309 |
|
MD5 | 2c563c18afdaa96516292a2a984978e3 |
|
BLAKE2b-256 | 83af61a5d4ca443e0bac1ffc11481abdf486f5d1ddba4f665e1e5ab0a091debc |