Skip to main content
Join the official 2019 Python Developers SurveyStart the survey!

Some helpful unicode string processing tools in native Python

Project description


Basically this projects was designed to provide a map of UTF8 Block Ranges to the locale/language code.

It uses project as its reference.


Clone the latest cldr release:

svn co ./cldr-release-34


pip install -r requirements/dev.txt

Then run test:


Tests will also populate the cldr_language_helpers/data directory. Or you can use just run populate tests:

pytest -sm generate


from cldr_language_helpers.annotator import StringAnnotator

assert StringAnnotator('123').char_types_by_index == [{'numbers'}, {'numbers'}, {'numbers'}]
assert 'ru_RU' in StringAnnotator('ф').langs_by_index[0]
assert {'ru_RU', 'en_US', 'en', 'ru'}.issubset(StringAnnotator('йцу 123 qwe LOL').all_langs)

stats = StringAnnotator('qwe йцу').lang_stats
assert stats['ru_RU'] == 3
assert stats['ru'] == 3
assert stats['en'] == 3
assert stats['space'] == 1

assert 'en' in StringAnnotator('somesortof123').langs_intersection

assert StringAnnotator('somesortof123').char_types_intersection == set()
assert StringAnnotator('somesortof').char_types_intersection == {'auxiliary', 'main'}
assert StringAnnotator(' ').char_types_intersection == {'space'}

assert StringAnnotator('что-то everything как-то lol !').split_by_lang_intersection() == \
       ['что-то', ' ', 'everything', ' ', 'как-то', ' ', 'lol', ' ', '!']
assert StringAnnotator('somesortof123').split_by_lang_intersection() == ['somesortof123']

assert StringAnnotator('qwe, 123!!!').split_by_char_type() == \
       ['qwe', ',', ' ', '123', '!!!']

assert StringAnnotator('йцу 123 qwe LOL').has_langs('ru', 'en')

assert StringAnnotator().has_langs_throughout('ru') is False
assert StringAnnotator('йцу 123').has_langs_throughout('ru') is True
assert StringAnnotator('йцу 123').has_langs_throughout('ru_RU') is True

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for cldr-language-helpers, version 1.2.0
Filename, size File type Python version Upload date Hashes
Filename, size cldr_language_helpers-1.2.0-py3-none-any.whl (327.2 kB) File type Wheel Python version 3.7 Upload date Hashes View hashes
Filename, size cldr-language-helpers-1.2.0.tar.gz (314.2 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page