Skip to main content

Generates Dutch plural and singular nouns in a very imperfect way using Hunspell dictionaries. Why imperfect? Because the Dutch language is full of exceptions.

Project description

Dutch Noun Pluralizer in Python

Generates plural and singular nouns in a very imperfect way using CyHunspell and OpenTaal dictionaries and word lists. Why imperfect? Because the Dutch language is full of exceptions.

The algorithm is based on the document "Basismorfologie. Het meervoud in het Nederlands" (Dutch) of the Université catholique de Louvain.

Note: I'm a .NET developer that does Python in my free time. I'm not a linguist, I just work for a Dutch company. Hence: this must be a very imperfect way of doing this. If you have good ideas, I welcome them, just open an issue.

Installation

Install from PIP:

pip install dutch-pluralizer

Note on Windows 10
CyHunspell is used. To use this package on Windows 10, you might need to install Build Tools for Visual Studio 2019 and choose the Windows 10 C++ SDK option.

Note on other Linux installations
Please check how you can install Hunspell on Linux if you don't want build Hunspell.

CLI usage

The project can be used as a CLI tool:

usage: dutch_pluralizer [-h] [-p] [-s] [-pa] [-v] word

Generates Dutch plural and singular nouns in a very imperfect way using Hunspell     
dictionaries. Why imperfect? Because the Dutch language is full of exceptions.       

positional arguments:
  word                  The word.

optional arguments:
  -h, --help            show this help message and exit
  -p, --pluralize       pluralizes the word.
  -s, --singularize     singularizes the word.
  -pa, --pluralize_advanced
                        shows advanced pluralization output.
  -v, --verbose         Shows an error message when a word could not be processed.   

API

The API can be used like this:

from dutch_pluralizer import pluralize, singularize

# pluralize will return the result or None
assert pluralize("kaas") == "kazen"
assert pluralize("kazen") == None

# singularize will return the result or None
assert singularize("kazen") == "kaas"
assert singularize("kaas") == None

Advanced pluralization will give you more options:

from dutch_pluralizer import pluralize, pluralize_advanced, singularize

adv = pluralize_advanced("album")

# the plural
assert adv.plural == 'albums'

# what the algorithm (without Hunspell) created
# is probably not correct, that's why Hunspell is
# used on it. It is like a preprocessing:
assert adv.algorithmic_plural == 'alba'

# indicates that end result was found in Hunspell
adv.hunspell_spelled = True

# the plural was found by replacement of 
# 'a' to 'ums'
assert adv.switched_ending_from == 'a'
assert adv.switched_ending_to == 'ums'

# suggestions given by Hunspell when the algorithmic
# result was processed:
assert adv.suggestions == ( 'Alba',
                            'aba',        
                            'balba',
                            'albe',
                            'alia',
                            'alla',
                            'alma',
                            'alfa',
                            'Elba')

Add custom words to the dictionary:

from dutch_pluralizer import pluralize, singularize
from dutch_pluralizer.speller import ensure_hunspell_nl

def test_readme_example_3():

    # default dictionary does not understand these words,
    # as they are not Dutch
    assert pluralize("fibulatie") == None
    assert singularize("fibulaties") == None

    # add the words to the dictionary
    h = ensure_hunspell_nl()
    h.add("fibulatie")
    h.add("fibulaties")

    # check again
    assert pluralize("fibulatie", speller=h) == "fibulaties"
    assert singularize("fibulaties", speller=h) == "fibulatie"

Help!? The result is not correct

I told you it was imperfect! There is stuff this package can and cannot do:

  • We cannot discover words that are not recognized by Hunspell
  • We can only process nouns (Dutch: zelfstandige naamwoorden)
  • We can only return a single result, but we know that the singular of graven can be either graaf or graf. We currently have no support for these use cases.
  • We can add words, just open up a ticket on GitHub. Please make sure you provide some evidence on why the word should be added (like a VanDale.nl result).

Development

If you want to contribute to local development, please consult the local development page.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dutch-pluralizer-0.0.41.tar.gz (1.7 MB view hashes)

Uploaded Source

Built Distribution

dutch_pluralizer-0.0.41-py3-none-any.whl (1.7 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page