Skip to main content

Compund word splitter for enchant supported languages

Project description

build status

Compound Word Splitter (cwsplit) for any language supported by enchant.


Make sure you have enchant dictionary installed.

You can check the list of installed packages by running:

import enchant

Check the pyenchant and enchant links for more info.


Import module:

from cwsplit import split

For German (Default)

# ['rind', 'fleisch']

For English:

split('blackboard', 'en_en')
# ['black', 'board']


from cwsplit import load_dict
# ['black', 'board']

Sometimes the word is misspelled or just doesn’t exist. By deafult the word will be split in characters until the longer word shows up.

Positive effect of this behaviour is the connecting letters like ‘s’ in überwachungsaufgaben will be isolated.

On the other hand, let’s imagine we have a non-existing word gibberishfleisch, this will be decompounded into words gib, b, e, r, i, s, h and fleisch.

split('gibberishfleisch', language='de_de')
# ['gib', 'b', 'e', 'r', 'i', 's', 'h', 'fleisch']

This does not look good at all. This is why you can select the sortest word size, so all shorter consecutive words will be concatenated. For example, let’s define the shortest ward as 4 characters long:

split('gibberishfleisch', language='de_de', min_word_size=4)
# ['gibberish', 'fleisch']

Now we get two words gibberish and fleisch, which is something you would expect.

This will not affect the correct words that have a connecting ‘s’.

For example:

split('übertragungsgesetz', min_word_size=4)
# ['übertragung','s', 'gesetz']

remains correct.


This is a very simple recursive algorithm that looks up for the longest word inside of the provided word, by checking if it exists in the enchant dictionary. The output is always returned as a list of strings. If no shorter words are found, the input word will be return as a single element list.


Upload script uses pandoc to convert to README in rst fromat, needed in order to create the package. Make sure you have it installed if you plan to use the script.

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for cwsplit, version 0.4.1
Filename, size File type Python version Upload date Hashes
Filename, size cwsplit-0.4.1.tar.gz (3.2 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page