Skip to main content

Functions for working with Vietnamese text

Project description

Functions for working with Vietnamese text

Installation

To get the latest stable release from PyPi

pip install viet_text_tools

Usage

normalize_diacritics()

You can normalize diacritics for a Vietnamese word. The return value is in composed (NFC) form

normalize_diacritics('nghìên') == 'nghiền'

Pass new_style=True to use new style tone placement

normalize_diacritics('thủy', new_style=True) == 'thuỷ'

Pass decomposed=True to return a string in decomposed (NFD) form

len(normalize_diacritics('thủy')) == 4
len(normalize_diacritics('thủy', decomposed=True)) == 5

vietnamese_sort_key()

A key function for use with sorted() to sort Vietnamese text with the correct collation order

words = ['anh', 'ba', 'áo', 'cắt', 'cá', 'cả']
sorted(words) == ['anh', 'ba', 'cá', 'cả', 'cắt', 'áo']
sorted(words, key=vietnamese_sort_key) == ['anh', 'áo', 'ba', 'cả', 'cá', 'cắt']

vietnamese_case_insensitive_sort_key()

Same as vietnamese_sort_key() but case-insensitive.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

viet_text_tools-0.1.6.tar.gz (4.3 kB view hashes)

Uploaded Source

Built Distribution

viet_text_tools-0.1.6-py3-none-any.whl (4.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page