Skip to main content

A character vomiting library — Unicode character sets for CJK, Thai, Vietnamese, and Perl uniprops.

Project description

Charguana

A library for "character vommitting".

Works on Python 3.10+ (tested through 3.14).

Install

pip install charguana

What's new in 0.2.0

  • get_charset(name) now returns a list instead of a generator. Use iter_charset(name) if you want the old lazy behavior.
  • all_in_charset(string, charset) added alongside islang — the former requires every character to match; islang remains "any character matches".
  • is_in_charsets(ch, ranges) is now exposed at the top level (previously only in charguana.korean).
  • perluniprops props (IsAlpha, IsAlnum, IsLower, IsUpper, IsSo) and chinese_strokes are now loaded lazily on first access, so import charguana is cheap.
  • Multiple Vietnamese IME bug fixes (VNI U7*, O1..5, U1..5; Telex Uws/Owf/Os/Us families previously produced the wrong vowel).

Usage

CJK characters:

>>> from charguana import get_charset

# Hiragana.
>>> ''.join(list(get_charset('hiragana')))
'\u3040ぁあぃいぅうぇえぉおかがきぎくぐけげこごさざしじすずせぜそぞただちぢっつづてでとどなにぬねのはばぱひびぴふぶぷへべぺほぼぽまみむめもゃやゅゆょよらりるれろゎわゐゑをんゔゕゖ\u3097\u3098゙゚゛゜ゝゞゟ'

# Katakana.
>>> ''.join(list(get_charset('katakana')))
'゠ァアィイゥウェエォオカガキギクグケゲコゴサザシジスズセゼソゾタダチヂッツヅテデトドナニヌネノハバパヒビピフブプヘベペホボポマミムメモャヤュユョヨラリルレロヮワヰヱヲンヴヵヶヷヸヹヺ・ーヽヾヿ'

# Bopomofo.
>>> ''.join(list(get_charset('bopomofo')))
'\u3100\u3101\u3102\u3103\u3104ㄅㄆㄇㄈㄉㄊㄋㄌㄍㄎㄏㄐㄑㄒㄓㄔㄕㄖㄗㄘㄙㄚㄛㄜㄝㄞㄟㄠㄡㄢㄣㄤㄥㄦㄧㄨㄩㄪㄫㄬㄭ\u312e\u312f'

# Punctuations
>>> ''.join(list(get_charset('punctuation')))
'\u3000、。〃〄々〆〇〈〉《》「」『』【】〒〓〔〕〖〗〘〙〚〛〜〝〞〟〠〡〢〣〤〥〦〧〨〩〪〭〮〯〫〬〰〱〲〳〴〵〶〷〸〹〺〻〼〽〾〿'

# Romanji
>>> ''.join(list(get_charset('romanji')))
'\uff00!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~⦅⦆。「」、・ヲァィゥェォャュョッーアイウエオカキクケコサシスセソタチツテトナニヌネノハヒフヘホマミムメモヤユヨラリルレロワン゙゚ᅠᄀᄁᆪᄂᆬᆭᄃᄄᄅᆰᆱᆲᆳᆴᆵᄚᄆᄇᄈᄡᄉᄊᄋᄌᄍᄎᄏᄐᄑᄒ\uffbf\uffc0\uffc1ᅡᅢᅣᅤᅥᅦ\uffc8\uffc9ᅧᅨᅩᅪᅫᅬ\uffd0\uffd1ᅭᅮᅯᅰᅱᅲ\uffd8\uffd9ᅳᅴᅵ\uffdd\uffde\uffdf¢£¬ ̄¦¥₩\uffe7│←↑→↓■○\uffef'


# Chinese.
>>> from charguana import tradify, simplify, chinese_strokes
>>> get_charset('chinese') == get_charset('zh')
True
>>> get_charset('zh') == get_charset('cn')
True
>>> get_charset('simplified_chinese')[:10]
['锕', '皑', '蔼', '碍', '爱', '嗳', '嫒', '瑷', '暧', '霭']
>>> get_charset('traditional_chinese')[:10]
['錒', '皚', '藹', '礙', '愛', '噯', '嬡', '璦', '曖', '靄']
>>> simplify('錒')
'锕'
>>> tradify('锕')
'錒'
>>> chinese_strokes['绝']
9
>>> chinese_strokes['絕']
12

# Japanese.
>>> ''.join(list(get_charset('japanese'))) == ''.join(list(get_charset('ja')))
True
>>> ''.join(list(get_charset('ja'))) == ''.join(list(get_charset('jp')))
True

# Korean.
>>> ''.join(list(get_charset('korean'))) == ''.join(list(get_charset('ko'))) == ''.join(list(get_charset('kr')))
True
>>> ''.join(list(get_charset('ko'))) == ''.join(list(get_charset('kr')))
True

# All Chinese, Korean, Japanese and Romanji.
>>> ''.join(list(get_charset('cjk')))

Perluniprops Characters:

>>> from charguana import get_charset

# Open Punctuation.
>>> ''.join(get_charset('Open_Punctuation'))
'([{༺༼᚛‚„⁅⁽₍〈❨❪❬❮❰❲❴⟅⟦⟨⟪⟬⟮⦃⦅⦇⦉⦋⦍⦏⦑⦓⦕⦗⧘⧚⧼⸢⸤⸦⸨〈《「『【〔〖〘〚〝﴾︗︵︷︹︻︽︿﹁﹃﹇﹙﹛﹝([{⦅「'

# Close Punctuation.
>>> ''.join(get_charset('Close_Punctuation'))
')]}༻༽᚜⁆⁾₎〉❩❫❭❯❱❳❵⟆⟧⟩⟫⟭⟯⦄⦆⦈⦊⦌⦎⦐⦒⦔⦖⦘⧙⧛⧽⸣⸥⸧⸩〉》」』】〕〗〙〛〞〟﴿︘︶︸︺︼︾﹀﹂﹄﹈﹚﹜﹞)]}⦆」'

# Currency Symbols.
>>> ''.join(get_charset('Currency_Symbol'))
'$¢£¤¥֏؋৲৳৻૱௹฿៛₠₡₢₣₤₥₦₧₨₩₪₫€₭₮₯₰₱₲₳₴₵₶₷₸₹₺꠸﷼﹩$¢£¥₩'

# Numbers.
>>> ''.join(list(get_charset('IsN'))[:50])
'0123456789²³¹¼½¾٠١٢٣٤٥٦٧٨٩۰۱۲۳۴۵۶۷۸۹߀߁߂߃߄߅߆߇߈߉०१२३'

# Alphabetic
>>> ''.join(list(get_charset('IsAlpha'))[:50])
'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwx'

# Lowercase.
>>> ''.join(list(get_charset('IsLower'))[:50])
'abcdefghijklmnopqrstuvwxyzªµºßàáâãäåæçèéêëìíîïðñòó'

# Uppercase.

>>> ''.join(list(get_charset('IsUpper'))[:50])
'ABCDEFGHIJKLMNOPQRSTUVWXYZÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØ'
# Alphanumeric
>>> ''.join(list(get_charset('IsAlnum'))[:50])
'0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmn'

Thai

# Thai.
>>> ''.join(list(get_charset('thai')))[:50]
'กขฃคฅฆงจฉชซฌญฎฏฐฑฒณดตถทธนบปผฝพฟภมยรฤลฦวศษสหฬอฮ\u0e7f฿ะั'
# Thai consonants.
>>> from charguana import get_charset_ranges
>>> from charguana.thai import thai_consonants
>>> list(get_charset_ranges([thai_consonants]))[:10]
['ก', 'ข', 'ฃ', 'ค', 'ฅ', 'ฆ', 'ง', 'จ', 'ฉ', 'ช']
# Thai Vowels
>>> from charguana.thai import thai_vowels_1, thai_vowels_2
>>> list(get_charset_ranges([thai_vowels_1, thai_vowels_2]))[:10]
['ะ', 'ั', 'า', 'ำ', 'ิ', 'ี', 'ึ', 'ื', 'ุ', 'ู']

Vietnamese

# Vietnamese
>>> from charguana import get_charset
>>> ''.join(list(get_charset('viet'))[:50])
'AĂÂBCChDĐEÊGGhGiHIKKhLMNNgNghNhOÔƠPPhQRSTThTrUƯVXYFJWZaăâbcchd'

>>> from charguana import get_charset
>>> ''.join(list(get_charset('viet'))[:50])
'AĂÂBCChDĐEÊGGhGiHIKKhLMNNgNghNhOÔƠPPhQRSTThTrUƯVXYFJWZaăâbcchd'

# Vietnamese tones.
>>> from charguana.viet import viet_tones
>>> viet_tones.huyen
'̀'
>>> 'o' + viet_tones.huyen
'ò'
>>> 'o' + viet_tones.sac
'ó'
>>> 'o' + viet_tones.hoi
'ỏ'
>>> 'o' + viet_tones.nga
'õ'
>>> 'o' + viet_tones.nang
'ọ'
>>> 'o' + viet_tones.ngang
'o'

# Vietnamese consonants.
>>> from charguana.viet import viet_consonants
>>> list(viet_consonants)
['A', 'Ă', 'Â', 'B', 'C', 'Ch', 'D', 'Đ', 'E', 'Ê', 'G', 'Gh', 'Gi', 'H', 'I', 'K', 'Kh', 'L', 'M', 'N', 'Ng', 'Ngh', 'Nh', 'O', 'Ô', 'Ơ', 'P', 'Ph', 'Q', 'R', 'S', 'T', 'Th', 'Tr', 'U', 'Ư', 'V', 'X', 'Y', 'F', 'J', 'W', 'Z', 'a', 'ă', 'â', 'b', 'c', 'ch', 'd', 'đ', 'e', 'ê', 'g', 'gh', 'gi', 'h', 'i', 'k', 'kh', 'l', 'm', 'n', 'ng', 'ngh', 'nh', 'o', 'ô', 'ơ', 'p', 'ph', 'q', 'r', 's', 't', 'th', 'tr', 'u', 'ư', 'v', 'x', 'y', 'f', 'j', 'w', 'z']

# Vietnamese vowels with diacritics.
>>> from charguana.viet import a, a6, a8
>>> a
['A', 'Á', 'À', 'Ả', 'Ã', 'Ạ', 'a', 'á', 'à', 'ả', 'ã', 'ạ']
>>> a6
['Â', 'Ấ', 'Ầ', 'Ẩ', 'Ẫ', 'Ậ', 'â', 'ấ', 'ầ', 'ẩ', 'ẫ', 'ậ']
>>> a8
['Ă', 'Ắ', 'Ằ', 'Ẳ', 'Ẵ', 'Ặ', 'ă', 'ắ', 'ằ', 'ẳ', 'ẵ', 'ặ']

# Vietnamese tones.
>>> from charguana.viet import viet_tones
>>> viet_tones
Tones(ngang='', huyen='̀', sac='́', hoi='̉', nga='̃', nang='̣')
>>> 'o' + viet_tones.sac
'ó'
>>> 'o' + viet_tones.nang
'ọ'

# Vietnamese IME.
>>> from charguana.viet import viet_ime
>>> viet_ime('Nguye64n Tra62n Anh Thu7')
'Nguyễn Trần Anh Thư'
# IME typo.
>>> viet_ime('Nguye64n Tra62n Anh Thu8') # uncheck.
'Nguyễn Trần Anh Thu8'
>>> viet_ime('Nguye64n Tra62n Anh Thu8', raise_keyerror=True) # check.
...
KeyError: 'u8'
# Telex
>>> viet_ime('Nguyeefn Traafn Anh Thuw', mapping='telex')
'Nguyền Trần Anh Thư'
# Short cut for TELEX ime with functools.partial
>>> from functools import partial
>>> from charguana.viet import viet_ime
>>> telex_ime = partial(viet_ime, mapping='telex')
>>> telex_ime('Nguyeefn Traafn Anh Thuw')
'Nguyền Trần Anh Thư'

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

charguana-0.2.0.tar.gz (191.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

charguana-0.2.0-py3-none-any.whl (197.0 kB view details)

Uploaded Python 3

File details

Details for the file charguana-0.2.0.tar.gz.

File metadata

  • Download URL: charguana-0.2.0.tar.gz
  • Upload date:
  • Size: 191.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for charguana-0.2.0.tar.gz
Algorithm Hash digest
SHA256 8533b9e794746eb08a1ff0dfcd293156e7ce11457aec9cea2cc210ad42a30cf1
MD5 12ec885c1e76fca743f45d502297e00c
BLAKE2b-256 ac88dae547bf762961fd003dd8a27db4d15db0c2eaf8968e9c0f87f8da7f4c72

See more details on using hashes here.

File details

Details for the file charguana-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: charguana-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 197.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for charguana-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9b01284587116661586d6401eca22c252d3d7ca868ba7a4069ac2e7c2ce7c3b8
MD5 99d7a920ef7066123c3865881b6595ef
BLAKE2b-256 2b02c97132a6fd9b488d16916a7b65d07b161c07722aea76d92568409470deed

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page