Functions for working with Vietnamese text
Project description
Functions for working with Vietnamese text
Installation
To get the latest stable release from PyPi
pip install viet_text_tools
Usage
normalize_diacritics()
You can normalize diacritics for a Vietnamese word. The return value is in composed (NFC) form
normalize_diacritics('nghìên') == 'nghiền'
Pass new_style=True to use new style tone placement
normalize_diacritics('thủy', new_style=True) == 'thuỷ'
Pass decomposed=True to return a string in decomposed (NFD) form
len(normalize_diacritics('thủy')) == 4
len(normalize_diacritics('thủy', decomposed=True)) == 5
vietnamese_sort_key()
A key function for use with sorted() to sort Vietnamese text with the correct collation order
words = ['anh', 'ba', 'áo', 'cắt', 'cá', 'cả']
sorted(words) == ['anh', 'ba', 'cá', 'cả', 'cắt', 'áo']
sorted(words, key=vietnamese_sort_key) == ['anh', 'áo', 'ba', 'cả', 'cá', 'cắt']
vietnamese_case_insensitive_sort_key()
Same as vietnamese_sort_key() but case-insensitive.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
viet_text_tools-0.1.6.tar.gz
(4.3 kB
view hashes)
Built Distribution
Close
Hashes for viet_text_tools-0.1.6-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e6fcc7d180b59cbb8a8c245ad31fdc5af41763726dd1e82cc5105d13533d5072 |
|
MD5 | 9b2e796aac6ff07774f0c5696e3d9704 |
|
BLAKE2b-256 | fd19fc232ac11a80a322159ac289d0d195064358a85a287ab986a7b5e791e2e8 |