Skip to main content

Substitute alternative spellings of native characters (e.g. German umlauts [ae, oe, ue] etc. [ss]) with their correct versions (ä, ö, ü, ß).

Project description

betterletter

In a given text, replaces alternative spellings of native characters with their proper spellings[^1]:

demo

Installation

pip install betterletter

Usage

The package will install a Python script of the same name, so instead of the usual python -m betterletter, you can simply invoke that directly, if the Python script directory is on your $PATH:

$ betterletter -h
usage: betterletter [-h] [-c] [-f] [-r] [-g] [-d] [--debug] {de}

Tool to replace alternative spellings of native characters (e.g. German
umlauts [ä, ö, ü] etc. [ß]) with the proper native characters. For example,
this problem occurs when no proper keyboard layout was available. This program
is dictionary-based to check if replacements are valid words. By default,
reads from STDIN and writes to STDOUT.

positional arguments:
  {de}             Text language to work with, in ISO 639-1 format.

options:
  -h, --help       show this help message and exit
  -c, --clipboard  Read from and write back to clipboard instead of
                   STDIN/STDOUT.
  -f, --force      Force substitutions and return the text version with the
                   maximum number of substitutions, even if they are illegal
                   words (useful for names).
  -r, --reverse    Reverse mode, where all native characters are simply
                   replaced by their alternative spellings.
  -g, --gui        Stop and open a GUI prompt for confirmation before
                   finishing.
  -d, --diff       Print a diff view of the substitutions to stderr.
  --debug          Output detailed logging information.

Usage Examples

Normal usage:

$ echo 'Hoeflich fragen waere angebracht!' | betterletter de
Höflich fragen wäre angebracht!

Reverse it:

$ echo 'Höflich fragen wäre angebracht!' | betterletter --reverse de
Hoeflich fragen waere angebracht!

A diff view, useful for longer text and to confirm correctness. The diff is written to STDERR so won't interfere with further redirection.

$ echo 'Hoeflich fragen waere angebracht!' | betterletter --diff de 2> diff.txt
Höflich fragen wäre angebracht!
$ cat diff.txt
- Hoeflich fragen waere angebracht!
?  ^^              ^^
+ Höflich fragen wäre angebracht!
?  ^              ^

The tool may be coerced into working with names:

$ # A name won't be in the dictionary:
$ echo 'Sehr geehrte Frau Huebenstetter, ...' | betterletter de
Sehr geehrte Frau Huebenstetter, ...
$ # But we can force it to work:
$ echo 'Sehr geehrte Frau Huebenstetter, ...' | betterletter --force de
Sehr geehrte Frau Hübenstetter, ...

Clipboard-based workflows are also possible:

# Nothing happens: clipboard is read and written to silently.
# Paste the processed version from your clipboard.
$ betterletter --clipboard de

Background

For example, German native characters and their corresponding alternative spellings (e.g. when no proper keyboard layout is at hand, or ASCII is used) are:

Native Character Alternative Spelling
Ä/ä Ae/ae
Ö/ö Oe/oe
Ü/ü Ue/ue
ẞ/ß SS/ss

These pairings are recorded here.

Going from left to right is simple: replace all native characters with their alternative spellings, minding case. That use case is also supported by this tool (reverse flag).

The other direction is much less straightforward: there exist countless words for which alternative spellings occur somewhere as a pattern, yet replacing them with the corresponding native character would be wrong:

Character Correct Spelling Wrong Spelling
Ä Aerodynamik Ärodynamik
Ä Israel Isräl
Ä Schufaeintrag Schufäintrag
Ö Koeffizient Köffizient
Ö Dominoeffekt Dominöffekt
Ö Poet Pöt
Ü Abenteuer Abenteür
Ü Mauer Maür
Ü Steuerung Steürung
ß Messgerät Meßgerät
ß Messe Meße
ß Abschluss Abschluß

just to name a few, pretty common examples.

As such, this tool is based on a dictionary lookup, see also the containing directory.

Long-form samples

See also the tests.

de

The input:

Ueberhaupt braeuchte es mal einen Teststring. Saetze ohne Bedeutung, aber mit vielen Umlauten. DRPFA-Angehoerige gehoeren haeufig nicht dazu. Bindestrich-Woerter spraechen Baende ueber Fehler. Doppelgaenger-Doppelgaenger sind doppelt droelfzig. Oder Uemlaeuten? Auslaeuten? Leute gaebe es, wuerde man meinen. Ueble Nachrede ist naechtens nicht erlaubt. Erlaube man dieses, waere es schoen uebertrieben. Busse muesste geloest werden, bevor Gruesse zum Gruss kommen. Busse sind Geraete, die womoeglich schnell fuehren. Voegel sind aehnlich zu Oel. Hierfuer ist fuer den droegen Poebel zu beachten, dass Anmassungen zu Gehoerverlust fuehren koennen. Stroemelschnoesseldaemel!

is turned into:

Überhaupt bräuchte es mal einen Teststring. Sätze ohne Bedeutung, aber mit vielen Umlauten. DRPFA-Angehörige gehören häufig nicht dazu. Bindestrich-Wörter sprächen Bände über Fehler. Doppelgänger-Doppelgänger sind doppelt droelfzig. Oder Uemlaeuten? Auslaeuten? Leute gäbe es, würde man meinen. Üble Nachrede ist nächtens nicht erlaubt. Erlaube man dieses, wäre es schön übertrieben. Buße müsste gelöst werden, bevor Grüße zum Gruß kommen. Buße sind Geräte, die womöglich schnell führen. Vögel sind ähnlich zu Öl. Hierfür ist für den drögen Pöbel zu beachten, dass Anmaßungen zu Gehörverlust führen können. Stroemelschnoesseldaemel!


Note that some corrections are out of scope for this little script, e.g.:

Busse

In German, Busse and Buße are two words of vastly different meaning (busses and penance, respectively). Unfortunately, they map to the same alternative spelling of Busse. The tool sees Busse (meaning just that, with no intent of changing it), notices Buße is a legal substitution, and therefore makes it. The tool has no awareness of context.

Turning substitutions like these off would mean the tool would no longer emit Buße, ever. This could be as undesirable as the current behaviour. There seems to be no easy resolve.

Development

This project uses poetry for dependency management. Refer to the poetry config file for more info (e.g. the required Python modules to install if you don't want to deal with poetry).

Using poetry, from the project root, run:

# Installs virtual environment according to lock file (if available in repo),
# otherwise pyproject.toml:
poetry install
# Run command within that environment:
poetry run python -m betterletter -h

Development tasks are all run through poetry, within the context of the virtual environment.

Run just (without arguments) for more available commands related to development.

AutoHotKey

This tool can be integrated with AutoHotKey, allowing you to use it at the touch of a button. This can be used to setup a keyboard shortcut to run this tool in-place, quickly replacing what you need without leaving your text editing environment.

The AutoHotKey file is here and requires AutoHotKey v2 (check out commits 7dd68f9 and earlier for the AHK v1.1 script).

Follow this guide to have the script launch on boot automatically.

AHK try icon generated using https://favicon.io/favicon-generator/.

[^1]: In this demo, Ctrl + C and Ctrl + V are inserted automatically using the AutoHotKey script. The user only selects the desired text and presses the hotkey, amounting to two keystrokes. The delay between the Ctrl + C and Ctrl + V keystrokes in the above demo is the script actually doing its work. First, the script reads in a dictionary from disk, taking constant time (O(1)), aka it doesn't scale with input size, just dictionary size. Sadly, this takes comparatively long for short texts. However, the script scales acceptably with longer inputs (regular O(n)). Very long inputs are required for the actual processing to take longer than the initial dictionary I/O. Hence, this script could run very fast if it were (re-)designed as a daemon, with the dictionary preloaded in memory.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

betterletter-1.2.1.tar.gz (6.7 MB view details)

Uploaded Source

Built Distribution

betterletter-1.2.1-py3-none-any.whl (6.7 MB view details)

Uploaded Python 3

File details

Details for the file betterletter-1.2.1.tar.gz.

File metadata

  • Download URL: betterletter-1.2.1.tar.gz
  • Upload date:
  • Size: 6.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.1 CPython/3.11.3

File hashes

Hashes for betterletter-1.2.1.tar.gz
Algorithm Hash digest
SHA256 b0ce3262d60311e56aa235b0bba760d54bdd7d60ad9903558ca1833ce54509fb
MD5 7d17bb73af8d70462cd75dfab4336c58
BLAKE2b-256 90b079471b60b2b12f9f1fbd67681c2b7372fe06d49b414a660f7ad711800674

See more details on using hashes here.

File details

Details for the file betterletter-1.2.1-py3-none-any.whl.

File metadata

  • Download URL: betterletter-1.2.1-py3-none-any.whl
  • Upload date:
  • Size: 6.7 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.1 CPython/3.11.3

File hashes

Hashes for betterletter-1.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 bcdb6f8dbee15a72318131d9329d6d503f5e211423f3011937c502e8de6d2234
MD5 87564f906ff0672211eff63d74c4617d
BLAKE2b-256 c03779fa7b5fa1c6925212f92e5388d9782c27805e666a7eb83bab26fdb8dfe7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page