Skip to main content

Some helpful unicode string processing tools in native Python

Project description

cldr-language-helpers

Basically this projects was designed to provide a map of UTF8 Block Ranges to the locale/language code.

It uses cldr.unicode.org project as its reference.

Build

Clone the latest cldr release:

svn co http://www.unicode.org/repos/cldr/tags/release-34/ ./cldr-release-34

Tests

pip install -r requirements/dev.txt

Then run test:

pytest

Tests will also populate the cldr_language_helpers/data directory. Or you can use just run populate tests:

pytest -sm generate

API

from cldr_language_helpers.annotator import StringAnnotator

assert StringAnnotator('123').char_types_by_index == [{'numbers'}, {'numbers'}, {'numbers'}]
assert 'ru_RU' in StringAnnotator('ф').langs_by_index[0]
assert {'ru_RU', 'en_US', 'en', 'ru'}.issubset(StringAnnotator('йцу 123 qwe LOL').all_langs)

stats = StringAnnotator('qwe йцу').lang_stats
assert stats['ru_RU'] == 3
assert stats['ru'] == 3
assert stats['en'] == 3
assert stats['space'] == 1

assert 'en' in StringAnnotator('somesortof123').langs_intersection

assert StringAnnotator('somesortof123').char_types_intersection == set()
assert StringAnnotator('somesortof').char_types_intersection == {'auxiliary', 'main'}
assert StringAnnotator(' ').char_types_intersection == {'space'}

assert StringAnnotator('что-то everything как-то lol !').split_by_lang_intersection() == \
       ['что-то', ' ', 'everything', ' ', 'как-то', ' ', 'lol', ' ', '!']
assert StringAnnotator('somesortof123').split_by_lang_intersection() == ['somesortof123']


assert StringAnnotator('qwe, 123!!!').split_by_char_type() == \
       ['qwe', ',', ' ', '123', '!!!']

assert StringAnnotator('йцу 123 qwe LOL').has_langs('ru', 'en')

assert StringAnnotator().has_langs_throughout('ru') is False
assert StringAnnotator('йцу 123').has_langs_throughout('ru') is True
assert StringAnnotator('йцу 123').has_langs_throughout('ru_RU') is True

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cldr-language-helpers-1.2.0.tar.gz (314.2 kB view details)

Uploaded Source

Built Distribution

cldr_language_helpers-1.2.0-py3-none-any.whl (327.2 kB view details)

Uploaded Python 3

File details

Details for the file cldr-language-helpers-1.2.0.tar.gz.

File metadata

File hashes

Hashes for cldr-language-helpers-1.2.0.tar.gz
Algorithm Hash digest
SHA256 14be0c35c6b5d5934635aa429fc60e9bb0fde3259838ace751429feed8d53578
MD5 611eacb44c634acc8af7601444e317cf
BLAKE2b-256 7c9f7dcfdcfe9ab7232d3e7c5b0575806182c00c4a76499e8f05c2a0e41ed948

See more details on using hashes here.

File details

Details for the file cldr_language_helpers-1.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for cldr_language_helpers-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f181ec1412fe67290420680274aadb4f264c84beb2bfc20c86fb485888d3cf73
MD5 50684b2b131a02fab5423927c7d44f25
BLAKE2b-256 ee0e2423b2cb6a5eec77fc65dec952455ec4a4ede749886f395e19af1ffa6bc7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page