Q – Rainer Schwarzbach’s Text Utilities
Project description
Q – Rainer Schwarzbach’s Text Utilities
Test conversion and transcoding utilities
Installation from PyPI
pip install qrstu
Installation in a virtual environment is strongly recommended.
Usage
reduce
The reduce module can be used to reduce Unicode text in Latin script to ASCII encodable Unicode text, similar to Unidecode but taking a different approach (ie. mostly wrapping functionality from the standard library module unicodedata). Unlike Unidecode which also transliterates characters from non-Latin scripts, reduce stubbornly refuses to handle these.
You can, however, specify an optional errors=
argument in the
reduce.reduce_text() call, which is passed to the internally used
codecs.encode()
function, thus taking advance of the codecs module errors handling.
>>> from qrstu import reduce
>>> # Vietnamese text
>>> reduce.reduce_text("Chào mừng đến với Hà Nội!")
'Chao mung dhen voi Ha Noi!'
>>>
>>> # Trying the Unidecode examples …
>>> reduce.reduce_text('kožušček')
'kozuscek'
>>> reduce.reduce_text('30 \U0001d5c4\U0001d5c6/\U0001d5c1')
'30 km/h'
>>> reduce.reduce_text('\u5317\u4EB0')
Traceback (most recent call last):
File "…/qrstu/src/qrstu/reduce.py", line 354, in reduce_text
chunk = translations[character.nfc]
~~~~~~~~~~~~^^^^^^^^^^^^^^^
KeyError: '北'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "…/qrstu/src/qrstu/reduce.py", line 276, in reduce
collector.append(PRESET_CHARACTER_REDUCTIONS[codepoint])
~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^
KeyError: 21271
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "…/qrstu/src/qrstu/reduce.py", line 356, in reduce_text
chunk = character.reduce(errors=errors)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "…/qrstu/src/qrstu/reduce.py", line 278, in reduce
encoded = codecs.encode(
^^^^^^^^^^^^^^
UnicodeEncodeError: 'ascii' codec can't encode character '\u5317' in position 0: ordinal not in range(128)
>>> reduce.reduce_text('\u5317\u4EB0', errors="ignore")
''
>>> reduce.reduce_text('\u5317\u4EB0', errors="replace")
'??'
>>> reduce.reduce_text('\u5317\u4EB0', errors="backslashreplace")
'\\u5317\\u4eb0'
>>> reduce.reduce_text('\u5317\u4EB0', errors="xmlcharrefreplace")
'北亰'
>>> reduce.reduce_text('\u5317\u4EB0', errors="namereplace")
'\\N{CJK UNIFIED IDEOGRAPH-5317}\\N{CJK UNIFIED IDEOGRAPH-4EB0}'
>>>
Further reading
Please see the documentation at https://blackstream-x.gitlab.io/qrstu for detailed usage information.
If you found a bug or have a feature suggestion, please open an issue here
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file qrstu-0.1.0.tar.gz
.
File metadata
- Download URL: qrstu-0.1.0.tar.gz
- Upload date:
- Size: 15.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.12.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 96e94d7d3538ae0aeda3db99e5f8b7ed8eeca300d518199d6e06edcad7e6ac32 |
|
MD5 | 84b360b734271794c1022182f188fa4a |
|
BLAKE2b-256 | a6bd6dbd0217f9dca973e24db9b9ba000f3855daffbd34917dbc04750bf41731 |
File details
Details for the file qrstu-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: qrstu-0.1.0-py3-none-any.whl
- Upload date:
- Size: 11.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.12.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 360cc22ca853be88b56d819bcf5c1d25b1f6639a2c8f8b90c766ac5c32933d85 |
|
MD5 | 77a8211322716cdf0af3f190b5f16a8a |
|
BLAKE2b-256 | 250cd74255e9de2f5f87862f3ed9133f5b372afa431c00dc86a665520ccf8343 |