Skip to main content

Q – Rainer Schwarzbach’s Text Utilities

Project description

Q – Rainer Schwarzbach’s Text Utilities

Test conversion and transcoding utilities

Installation from PyPI

pip install qrstu

Installation in a virtual environment is strongly recommended.

Usage

reduce

The reduce module can be used to reduce Unicode text in Latin script to ASCII encodable Unicode text, similar to Unidecode but taking a different approach (ie. mostly wrapping functionality from the standard library module unicodedata). Unlike Unidecode which also transliterates characters from non-Latin scripts, reduce stubbornly refuses to handle these.

You can, however, specify an optional errors= argument in the reduce.reduce_text() call, which is passed to the internally used codecs.encode() function, thus taking advance of the codecs module errors handling.

>>> from qrstu import reduce
>>> # Vietnamese text
>>> reduce.reduce_text("Chào mừng đến với Hà Nội!")
'Chao mung dhen voi Ha Noi!'
>>>
>>> # Trying the Unidecode examples …
>>> reduce.reduce_text('kožušček')
'kozuscek'
>>> reduce.reduce_text('30 \U0001d5c4\U0001d5c6/\U0001d5c1')
'30 km/h'
>>> reduce.reduce_text('\u5317\u4EB0')
Traceback (most recent call last):
  File "…/qrstu/src/qrstu/reduce.py", line 354, in reduce_text
    chunk = translations[character.nfc]
            ~~~~~~~~~~~~^^^^^^^^^^^^^^^
KeyError: '北'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "…/qrstu/src/qrstu/reduce.py", line 276, in reduce
    collector.append(PRESET_CHARACTER_REDUCTIONS[codepoint])
                     ~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^
KeyError: 21271

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "…/qrstu/src/qrstu/reduce.py", line 356, in reduce_text
    chunk = character.reduce(errors=errors)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "…/qrstu/src/qrstu/reduce.py", line 278, in reduce
    encoded = codecs.encode(
              ^^^^^^^^^^^^^^
UnicodeEncodeError: 'ascii' codec can't encode character '\u5317' in position 0: ordinal not in range(128)
>>> reduce.reduce_text('\u5317\u4EB0', errors="ignore")
''
>>> reduce.reduce_text('\u5317\u4EB0', errors="replace")
'??'
>>> reduce.reduce_text('\u5317\u4EB0', errors="backslashreplace")
'\\u5317\\u4eb0'
>>> reduce.reduce_text('\u5317\u4EB0', errors="xmlcharrefreplace")
'&#21271;&#20144;'
>>> reduce.reduce_text('\u5317\u4EB0', errors="namereplace")
'\\N{CJK UNIFIED IDEOGRAPH-5317}\\N{CJK UNIFIED IDEOGRAPH-4EB0}'
>>>

Further reading

Please see the documentation at https://blackstream-x.gitlab.io/qrstu for detailed usage information.

If you found a bug or have a feature suggestion, please open an issue here

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

qrstu-0.1.0.tar.gz (15.2 kB view details)

Uploaded Source

Built Distribution

qrstu-0.1.0-py3-none-any.whl (11.0 kB view details)

Uploaded Python 3

File details

Details for the file qrstu-0.1.0.tar.gz.

File metadata

  • Download URL: qrstu-0.1.0.tar.gz
  • Upload date:
  • Size: 15.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.12.0

File hashes

Hashes for qrstu-0.1.0.tar.gz
Algorithm Hash digest
SHA256 96e94d7d3538ae0aeda3db99e5f8b7ed8eeca300d518199d6e06edcad7e6ac32
MD5 84b360b734271794c1022182f188fa4a
BLAKE2b-256 a6bd6dbd0217f9dca973e24db9b9ba000f3855daffbd34917dbc04750bf41731

See more details on using hashes here.

File details

Details for the file qrstu-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: qrstu-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 11.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.12.0

File hashes

Hashes for qrstu-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 360cc22ca853be88b56d819bcf5c1d25b1f6639a2c8f8b90c766ac5c32933d85
MD5 77a8211322716cdf0af3f190b5f16a8a
BLAKE2b-256 250cd74255e9de2f5f87862f3ed9133f5b372afa431c00dc86a665520ccf8343

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page