Skip to main content

character normalization (especially for Latin letters - linguistic purposes)

Project description

character normalization (especially for Latin letters - linguistic purposes)

pip install normaltext

the lookup function simplifies character analysis, provides replacement suggestions, and offers performance improvements through memorization. It can be beneficial for tasks involving character normalization, text processing, or any scenario where character properties and substitutions are relevant.

Tested against Windows 10 / Python 3.10 / Anaconda

The lookup function can be used by developers or anyone working with text processing or character manipulation tasks. It provides information about a given character and suggests a replacement based on certain criteria.

Character Information:

The function retrieves the name of the character using unicodedata.name() and provides a sorted list of words representing the character name. This can be useful for analyzing and understanding the properties of a character.

Suggested Replacement:

The function suggests a replacement for the character based on the provided criteria. By considering factors like case sensitivity, printability, and capitalization, the function offers a recommended substitution. This can be beneficial when you need to transform or normalize characters in a specific context.

Memoization and Performance:

The function utilizes the functools.lru_cache decorator, which caches the results of previous function calls. This means that if the function is called multiple times with the same character, the result is retrieved from the cache instead of recomputing it. This caching mechanism can significantly improve the performance of the function when there are repetitive or redundant character lookups.

Flexibility:

The lookup function provides optional parameters that allow customization of its behavior. The case_sens parameter determines whether case sensitivity is considered for replacements. The replace parameter allows setting a default replacement character. The add_to_printable parameter enables the addition of extra uppercase characters to the set of printable characters. These options provide flexibility to adapt the function to different requirements and use cases.

from normaltext import lookup
sen = "Montréal, über, 12.89, Mère, Françoise, noël, 889"
norm = ''.join([lookup(k, case_sens=True, replace='x', add_to_printable='')['suggested'] for k in sen])
print(norm)
#########################
sen2 = 'kožušček'
norm2 = ''.join([lookup(k, case_sens=True, replace='x', add_to_printable='')['suggested'] for k in sen2])
print(norm2)
#########################
sen3 = "Falsches Üben von Xylophonmusik quält jeden größeren Zwerg."
norm3 = ''.join([lookup(k, case_sens=True, replace='x', add_to_printable='')['suggested'] for k in
                 sen3])  # doesn't preserve ü - ue ...
print(norm3)
#########################
sen4 = "cætera"
norm4 = ''.join([lookup(k, case_sens=True, replace='x', add_to_printable='ae')['suggested'] for k in
                 sen4])  # doesn't preserve ü - ue ...
print(norm4)
Montreal, uber, 12.89, Mere, Francoise, noel, 889
kozuscek
Falsches Uben von Xylophonmusik qualt jeden groseren Zwerg.
caetera

Project details


Release history Release notifications | RSS feed

This version

0.10

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

normaltext-0.10.tar.gz (4.8 kB view details)

Uploaded Source

Built Distribution

normaltext-0.10-py3-none-any.whl (6.8 kB view details)

Uploaded Python 3

File details

Details for the file normaltext-0.10.tar.gz.

File metadata

  • Download URL: normaltext-0.10.tar.gz
  • Upload date:
  • Size: 4.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.10

File hashes

Hashes for normaltext-0.10.tar.gz
Algorithm Hash digest
SHA256 36c352d1ccba9cea3c35c30d97acfbdeae5385b6928163df514e1d152d53eda9
MD5 6291869b766bcf96aca058dedbe5c6f5
BLAKE2b-256 504c7b4bea694434e75d9aacecee741004fa820376cd69493963f5d81b52a567

See more details on using hashes here.

File details

Details for the file normaltext-0.10-py3-none-any.whl.

File metadata

  • Download URL: normaltext-0.10-py3-none-any.whl
  • Upload date:
  • Size: 6.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.10

File hashes

Hashes for normaltext-0.10-py3-none-any.whl
Algorithm Hash digest
SHA256 89cf4936fb8776619dac04cdc626bf51b3eabcf6ba7592fd05163e13c47ca0f1
MD5 7f5c0a8745bd36e5d37140574d678e6e
BLAKE2b-256 8845a4f21d3f925cc126656c2646f131e70ba7aa7cd065105d8a013abd851a11

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page