Some helpful unicode string processing tools in native Python
Project description
cldr-language-helpers
Basically this projects was designed to provide a map of UTF8 Block Ranges to the locale/language code.
It uses cldr.unicode.org project as its reference.
Build
Clone the latest cldr release:
svn co http://www.unicode.org/repos/cldr/tags/release-34/ ./cldr-release-34
Tests
pip install -r requirements/dev.txt
Then run test:
pytest
Tests will also populate the cldr_language_helpers/data directory. Or you can use just run populate tests:
pytest -sm generate
API
from cldr_language_helpers.annotator import StringAnnotator
assert StringAnnotator('123').char_types_by_index == [{'numbers'}, {'numbers'}, {'numbers'}]
assert 'ru_RU' in StringAnnotator('ф').langs_by_index[0]
assert {'ru_RU', 'en_US', 'en', 'ru'}.issubset(StringAnnotator('йцу 123 qwe LOL').all_langs)
stats = StringAnnotator('qwe йцу').lang_stats
assert stats['ru_RU'] == 3
assert stats['ru'] == 3
assert stats['en'] == 3
assert stats['space'] == 1
assert 'en' in StringAnnotator('somesortof123').langs_intersection
assert StringAnnotator('somesortof123').char_types_intersection == set()
assert StringAnnotator('somesortof').char_types_intersection == {'auxiliary', 'main'}
assert StringAnnotator(' ').char_types_intersection == {'space'}
assert StringAnnotator('что-то everything как-то lol !').split_by_lang_intersection() == \
['что-то', ' ', 'everything', ' ', 'как-то', ' ', 'lol', ' ', '!']
assert StringAnnotator('somesortof123').split_by_lang_intersection() == ['somesortof123']
assert StringAnnotator('qwe, 123!!!').split_by_char_type() == \
['qwe', ',', ' ', '123', '!!!']
assert StringAnnotator('йцу 123 qwe LOL').has_langs('ru', 'en')
assert StringAnnotator().has_langs_throughout('ru') is False
assert StringAnnotator('йцу 123').has_langs_throughout('ru') is True
assert StringAnnotator('йцу 123').has_langs_throughout('ru_RU') is True
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
cldr-language-helpers-1.2.0.tar.gz
(314.2 kB
view details)
Built Distribution
File details
Details for the file cldr-language-helpers-1.2.0.tar.gz
.
File metadata
- Download URL: cldr-language-helpers-1.2.0.tar.gz
- Upload date:
- Size: 314.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: Python-urllib/3.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 14be0c35c6b5d5934635aa429fc60e9bb0fde3259838ace751429feed8d53578 |
|
MD5 | 611eacb44c634acc8af7601444e317cf |
|
BLAKE2b-256 | 7c9f7dcfdcfe9ab7232d3e7c5b0575806182c00c4a76499e8f05c2a0e41ed948 |
File details
Details for the file cldr_language_helpers-1.2.0-py3-none-any.whl
.
File metadata
- Download URL: cldr_language_helpers-1.2.0-py3-none-any.whl
- Upload date:
- Size: 327.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: Python-urllib/3.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f181ec1412fe67290420680274aadb4f264c84beb2bfc20c86fb485888d3cf73 |
|
MD5 | 50684b2b131a02fab5423927c7d44f25 |
|
BLAKE2b-256 | ee0e2423b2cb6a5eec77fc65dec952455ec4a4ede749886f395e19af1ffa6bc7 |