Skip to main content

Python library used for diacritic manipulation.

Project description

Diacritics Library

Code Style: Black Imports: isort PRs welcome

This library is used for adding, and removing diacritics from strings.

Getting started

Start by importing the module:

import dcl

DCL currently supports a multitude of diacritics:

  • acute
  • breve
  • caron
  • cedilla
  • grave
  • interpunct
  • macron
  • ogonek
  • ring
  • ring_and_acute
  • slash
  • stroke
  • stroke_and_acute
  • tilde
  • tittle
  • umlaut/diaresis
  • umlaut_and_macron

Each accent has their own attribute which is directly accessible from the dcl module.

dcl.acute('a')
>>> 'á'

These attributes return a Character object, which is essentially just a handy "wrapper" around our diacritic, which we can use to access various attributes to retrieve further information about the diacritic we're focusing on.

char = dcl.ogonek('a')

repr(char)
>>> "<ogonek 'ą'>"

char.character  # the same as str(char)
>>> 'ą'

char.diacritic  # some return <unprintable>
>>> '˛'

char.diacritic_name
>>> 'ogonek'

char.raw  # returns the raw representation of our character
>>> '\U00000105'

char.raw_diacritic 
>>> '\U000002db'

Some functions can't take certain letters. For example, the letter h cannot take a cedilla diacritic. In this case, an exception is raised named DiacriticError. You can access this exception via dcl.errors.DiacriticError.

from dcl.errors import DiacriticError

try:
    char = dcl.cedilla('h')
except DiacriticError as e:
    print(e)
else:
    print(repr(char))

>>> 'Character h cannot take a cedilla diacritic'

If you want to, you may also use the DiacriticApplicant object from dcl.objects. The functions you see above use this object too, and it's virtually the same principle, except from the fact that we use properties to get the diacritic, and the class simply holds the string and it's properties. Alas with the functions above, this object also returns the same Character object through it's properties.

from dcl.objects import DiacriticApplicant

da = DiacriticApplicant('a')
repr(da.ogonek)
>>> "<ogonek 'ą'>"

There is also the clean_diacritics function, accessible straight from the dcl module. This function allows us to completely clean a string from any diacritics.

dcl.clean_diacritics("Krëûšàdå")
>>> 'Kreusada'

dcl.clean_diacritics("Café")
>>> 'Cafe'

Along with this function, there's also count_diacritics, get_diacritics and has_diacritics.

The has_diacritics function simply checks if the string contains a character with a diacritic.

dcl.has_diacritics("Café")
>>> True

dcl.has_diacritics("dcl")
>>> False

The get_diacritics function is used to get all the diacritics in a string. It returns a dictionary. For each diacritic in the string, the key will show the diacritic's index in the string, and the value will show the Character representation.

dcl.get_diacritics("Café")
>>> {3: <acute 'é'>}

dcl.get_diacritics("Krëûšàdå")
>>> {2: <umlaut 'ë'>, 3: <circumflex 'û'>, 4: <caron 'š'>, 5: <grave 'à'>, 7: <ring 'å'>}

The count_diacritics function counts the number of diacritics in a string. The actual implementation of this simply returns the dictionary length from get_diacritics.

dcl.count_diacritics("Café")
>>> 1

Creating an end user program

Creating a program would be pretty simple for this, and I'd love to be able to help you out with a base idea. Have a look at this for example:

import dcl
import string

from dcl.errors import DiacriticError

char = str(input("Enter a character: "))
if not char in string.ascii_letters:
    print("Please enter a letter from a-Z.")
else:
    accent = str(input("Enter an accent, you can choose from the following: " + ", ".join(dcl.diacritic_list)))
    if not dcl.isdiacritictype(accent):
        print("That was not a valid accent.")
    else:
        try:
            function = getattr(dcl, accent)  # or dcl.objects.DiacriticApplicant
            output = function(char)
        except DiacriticError as e:
            print(e)
        else:
            print(str(output))

It's worth checking if the provided accent is a diacritic type. If it is, then you can use getattr. Without checking, the user could provide a default global such as __file__.

You can also create a program which can remove diacritics from a string. It's made easy!

import dcl

string = str(input("Enter the string which you want to be cleared from diacritics: "))
print("Here is your cleaned string: " + dcl.clean_diacritics(string))

Or perhaps your program wants to count the number of diacritics contained within your string.

import dcl

string = str(input("This program will count the number of diacritics contained in your input. Enter a string: "))
count = dcl.count_diacritics(string)
if count == 1:
    grammar = "is"
else:
    grammar = "are"
print(f"There {grammar} {count} diacritics/accent in your string.")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dcl-1.0.0.tar.gz (10.2 kB view details)

Uploaded Source

Built Distribution

dcl-1.0.0-py3-none-any.whl (10.5 kB view details)

Uploaded Python 3

File details

Details for the file dcl-1.0.0.tar.gz.

File metadata

  • Download URL: dcl-1.0.0.tar.gz
  • Upload date:
  • Size: 10.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.3 pkginfo/1.7.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.8.5

File hashes

Hashes for dcl-1.0.0.tar.gz
Algorithm Hash digest
SHA256 00c26ee0c033a432a1c7f77b3077af5bd2a4600b7e08d37a6ff6709427cacbaf
MD5 54d57cabeb1c9b674352a79a72bdb4b7
BLAKE2b-256 af8e682f835a67b1c780ff223ac7639d682b38bd9b0e732ad5d9e3c7e986e895

See more details on using hashes here.

File details

Details for the file dcl-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: dcl-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 10.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.3 pkginfo/1.7.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.8.5

File hashes

Hashes for dcl-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7d8d23dea98fe35e66bdb77c50a8a469107905e12aef980517d615780c4edfda
MD5 1583fe05bfc01fafc035c057ce0336f6
BLAKE2b-256 9dca1a398a2d59b1d7a983872dce17791a33c8866aae8a9e19130c8ab8a4761c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page