Substitute alternative spellings of native characters (e.g. German umlauts [ae, oe, ue] etc. [ss]) with their correct versions (ä, ö, ü, ß).
Project description
betterletter
In a given text, replaces alternative spellings of native characters with their proper spellings.
For example, German native characters and their corresponding alternative spellings (e.g. when no proper keyboard layout is at hand, or ASCII is used) are:
Native Character | Alternative Spelling |
---|---|
Ä/ä | Ae/ae |
Ö/ö | Oe/oe |
Ü/ü | Ue/ue |
ẞ/ß | SS/ss |
These pairings are recorded here.
Going from left to right is simple: replace all native characters with their alternative spellings, minding case.
That use case is also supported by this tool (reverse
flag).
The other direction is much less straightforward: there exist countless words for which alternative spellings occur somewhere as a pattern, yet replacing them with the corresponding native character would be wrong:
Character | Correct Spelling | Wrong Spelling |
---|---|---|
Ä | Aerodynamik | Ärodynamik |
Ä | Israel | Isräl |
Ä | Schufaeintrag | Schufäintrag |
Ö | Koeffizient | Köffizient |
Ö | Dominoeffekt | Dominöffekt |
Ö | Poet | Pöt |
Ü | Abenteuer | Abenteür |
Ü | Mauer | Maür |
Ü | Steuerung | Steürung |
ß | Messgerät | Meßgerät |
ß | Messe | Meße |
ß | Abschluss | Abschluß |
just to name a few, pretty common examples.
As such, this tool is based on a dictionary lookup, see also the containing directory.
Examples
See also the tests.
de
The input:
Ueberhaupt braeuchte es mal einen Teststring. Saetze ohne Bedeutung, aber mit vielen Umlauten. DRPFA-Angehoerige gehoeren haeufig nicht dazu. Bindestrich-Woerter spraechen Baende ueber Fehler. Doppelgaenger-Doppelgaenger sind doppelt droelfzig. Oder Uemlaeuten? Auslaeuten? Leute gaebe es, wuerde man meinen. Ueble Nachrede ist naechtens nicht erlaubt. Erlaube man dieses, waere es schoen uebertrieben. Busse muesste geloest werden, bevor Gruesse zum Gruss kommen. Busse sind Geraete, die womoeglich schnell fuehren. Voegel sind aehnlich zu Oel. Hierfuer ist fuer den droegen Poebel zu beachten, dass Anmassungen zu Gehoerverlust fuehren koennen. Stroemelschnoesseldaemel!
is turned into:
Überhaupt bräuchte es mal einen Teststring. Sätze ohne Bedeutung, aber mit vielen Umlauten. DRPFA-Angehörige gehören häufig nicht dazu. Bindestrich-Wörter sprächen Bände über Fehler. Doppelgänger-Doppelgänger sind doppelt droelfzig. Oder Uemlaeuten? Auslaeuten? Leute gäbe es, würde man meinen. Üble Nachrede ist nächtens nicht erlaubt. Erlaube man dieses, wäre es schön übertrieben. Buße müsste gelöst werden, bevor Grüße zum Gruß kommen. Buße sind Geräte, die womöglich schnell führen. Vögel sind ähnlich zu Öl. Hierfür ist für den drögen Pöbel zu beachten, dass Anmaßungen zu Gehörverlust führen können. Stroemelschnoesseldaemel!
Note that some corrections are out of scope for this little script, e.g.:
Busse
In German, Busse and Buße are two words of vastly different meaning (busses and penance, respectively). Unfortunately, they map to the same alternative spelling of Busse. The tool sees Busse (meaning just that, with no intent of changing it), notices Buße is a legal substitution, and therefore makes it. The tool has no awareness of context.
Turning substitutions like these off would mean the tool would no longer emit Buße, ever. This could be as undesirable as the current behaviour. There seems to be no easy resolve.
Running
Prerequisites
Ideally, run the project (as is good, albeit annoying practice) in its own virtual environment.
This project uses poetry for dependency management.
Refer to the poetry config file for more info (e.g. the required Python modules to install if you don't want to deal with poetry
).
Using poetry, from the project root, run:
# Installs virtual environment according to lock file (if available in repo),
# otherwise pyproject.toml:
poetry install
# Run command within that environment:
poetry run python -m betterletter -h
Usage
Usage help (invoke from this project's root) will display all options:
poetry run python -m betterletter -h
The tool can read from STDIN (outputting to STDOUT), or work with the clipboard (overwriting its contents with a corrected version). This allows for example:
$ cat test.txt
Hoeflich fragen!
$ cat test.txt | poetry run python -m betterletter de
Höflich fragen!
# Reverse mode:
$ cat test.txt | poetry run python -m betterletter de | poetry run python -m betterletter -r de
Hoeflich fragen!
or
poetry run python -m betterletter -c de
# Nothing happens: clipboard is read and written to silently.
After installing (see below) the package, these calls should work system-wide.
Development
Development tasks are all run through poetry
, within the context of the virtual environment.
The latter is created through
poetry install
and then accessed through either poetry run <command>
or poetry shell
.
Run make
(without arguments) for more available commands.
AutoHotKey
This tool can be integrated with AutoHotKey, allowing you to use it at the touch of a button. This can be used to setup a keyboard shortcut to run this tool in-place, quickly replacing what you need without leaving your text editing environment.
I was unable to use poetry
commands for this, getting
The system cannot find the file specified.
It works with plain python
invocations.
Thanks to SetWorkingDir
, we do not have to install the module system-wide.
However, the requirements need to be installed and available to the plain python
command.
The AutoHotKey file is here.
Follow this guide to have the script launch on boot automatically.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for betterletter-1.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | aa35be6e8bab85791cda948b8026ad1e2dab29c41528fa759859a735b619b306 |
|
MD5 | 52403526ba4e2eea1ea7394bbf185a66 |
|
BLAKE2b-256 | 3a5a85a445d2efef16602379451a16db6eb1e9460826441fb2cf8f5134f191a1 |