Some helpful unicode string processing tools in native Python
Project description
cldr-language-helpers
Basically this projects was designed to provide a map of UTF8 Block Ranges to the locale/language code.
It uses cldr.unicode.org project as its reference.
Build
Clone the latest cldr release:
svn co http://www.unicode.org/repos/cldr/tags/release-34/ ./cldr-release-34
Tests
pip install -r requirements/dev.txt
Then run test:
pytest
Tests will also populate the cldr_language_helpers/data directory. Or you can use just run populate tests:
pytest -sm generate
API
from cldr_language_helpers.annotator import StringAnnotator
assert StringAnnotator('123').char_types_by_index == [{'numbers'}, {'numbers'}, {'numbers'}]
assert 'ru_RU' in StringAnnotator('ф').langs_by_index[0]
assert {'ru_RU', 'en_US', 'en', 'ru'}.issubset(StringAnnotator('йцу 123 qwe LOL').all_langs)
stats = StringAnnotator('qwe йцу').lang_stats
assert stats['ru_RU'] == 3
assert stats['ru'] == 3
assert stats['en'] == 3
assert stats['space'] == 1
assert 'en' in StringAnnotator('somesortof123').langs_intersection
assert StringAnnotator('somesortof123').char_types_intersection == set()
assert StringAnnotator('somesortof').char_types_intersection == {'auxiliary', 'main'}
assert StringAnnotator(' ').char_types_intersection == {'space'}
assert StringAnnotator('что-то everything как-то lol !').split_by_lang_intersection() == \
['что-то', ' ', 'everything', ' ', 'как-то', ' ', 'lol', ' ', '!']
assert StringAnnotator('somesortof123').split_by_lang_intersection() == ['somesortof123']
assert StringAnnotator('qwe, 123!!!').split_by_char_type() == \
['qwe', ',', ' ', '123', '!!!']
assert StringAnnotator('йцу 123 qwe LOL').has_langs('ru', 'en')
assert StringAnnotator().has_langs_throughout('ru') is False
assert StringAnnotator('йцу 123').has_langs_throughout('ru') is True
assert StringAnnotator('йцу 123').has_langs_throughout('ru_RU') is True
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cldr-language-helpers-1.2.0.tar.gz.
File metadata
- Download URL: cldr-language-helpers-1.2.0.tar.gz
- Upload date:
- Size: 314.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: Python-urllib/3.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
14be0c35c6b5d5934635aa429fc60e9bb0fde3259838ace751429feed8d53578
|
|
| MD5 |
611eacb44c634acc8af7601444e317cf
|
|
| BLAKE2b-256 |
7c9f7dcfdcfe9ab7232d3e7c5b0575806182c00c4a76499e8f05c2a0e41ed948
|
File details
Details for the file cldr_language_helpers-1.2.0-py3-none-any.whl.
File metadata
- Download URL: cldr_language_helpers-1.2.0-py3-none-any.whl
- Upload date:
- Size: 327.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: Python-urllib/3.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f181ec1412fe67290420680274aadb4f264c84beb2bfc20c86fb485888d3cf73
|
|
| MD5 |
50684b2b131a02fab5423927c7d44f25
|
|
| BLAKE2b-256 |
ee0e2423b2cb6a5eec77fc65dec952455ec4a4ede749886f395e19af1ffa6bc7
|