Skip to main content

Generates Dutch plural and singular nouns in a very imperfect way using Hunspell dictionaries. Why imperfect? Because the Dutch language is full of exceptions.

Project description

Dutch Noun Pluralizer in Python

Generates plural and singular nouns in a very imperfect way using CyHunspell and OpenTaal dictionaries and word lists. Why imperfect? Because the Dutch language is full of exceptions.

The algorithm is based on the document "Basismorfologie. Het meervoud in het Nederlands" (Dutch) of the Université catholique de Louvain.

Note: I'm a .NET developer that does Python in my free time. I'm not a linguist, I just work for a Dutch company. Hence: this must be a very imperfect way of doing this. If you have good ideas, I welcome them, just open an issue.

Installation

Install from PIP:

pip install dutch-pluralizer

Note on Windows 10
CyHunspell is used. To use this package on Windows 10, you might need to install Build Tools for Visual Studio 2019 and choose the Windows 10 C++ SDK option.

Note on other Linux installations
Please check how you can install Hunspell on Linux if you don't want build Hunspell.

CLI usage

The project can be used as a CLI tool:

usage: dutch_pluralizer [-h] [-p] [-s] [-pa] [-v] word

Generates Dutch plural and singular nouns in a very imperfect way using Hunspell     
dictionaries. Why imperfect? Because the Dutch language is full of exceptions.       

positional arguments:
  word                  The word.

optional arguments:
  -h, --help            show this help message and exit
  -p, --pluralize       pluralizes the word.
  -s, --singularize     singularizes the word.
  -pa, --pluralize_advanced
                        shows advanced pluralization output.
  -v, --verbose         Shows an error message when a word could not be processed.   

API

The API can be used like this:

from dutch_pluralizer import pluralize, singularize

# pluralize will return the result or None
assert pluralize("kaas") == "kazen"
assert pluralize("kazen") == None

# singularize will return the result or None
assert singularize("kazen") == "kaas"
assert singularize("kaas") == None

Advanced pluralization will give you more options:

from dutch_pluralizer import pluralize, pluralize_advanced, singularize

adv = pluralize_advanced("album")

# the plural
assert adv.plural == 'albums'

# what the algorithm (without Hunspell) created
# is probably not correct, that's why Hunspell is
# used on it. It is like a preprocessing:
assert adv.algorithmic_plural == 'alba'

# indicates that end result was found in Hunspell
adv.hunspell_spelled = True

# the plural was found by replacement of 
# 'a' to 'ums'
assert adv.switched_ending_from == 'a'
assert adv.switched_ending_to == 'ums'

# suggestions given by Hunspell when the algorithmic
# result was processed:
assert adv.suggestions == ( 'Alba',
                            'aba',        
                            'balba',
                            'albe',
                            'alia',
                            'alla',
                            'alma',
                            'alfa',
                            'Elba')

Add custom words to the dictionary:

from dutch_pluralizer import pluralize, singularize
from dutch_pluralizer.speller import ensure_hunspell_nl

def test_readme_example_3():

    # default dictionary does not understand these words,
    # as they are not Dutch
    assert pluralize("fibulatie") == None
    assert singularize("fibulaties") == None

    # add the words to the dictionary
    h = ensure_hunspell_nl()
    h.add("fibulatie")
    h.add("fibulaties")

    # check again
    assert pluralize("fibulatie", speller=h) == "fibulaties"
    assert singularize("fibulaties", speller=h) == "fibulatie"

Help!? The result is not correct

I told you it was imperfect! There is stuff this package can and cannot do:

  • We cannot discover words that are not recognized by Hunspell
  • We can only process nouns (Dutch: zelfstandige naamwoorden)
  • We can only return a single result, but we know that the singular of graven can be either graaf or graf. We currently have no support for these use cases.
  • We can add words, just open up a ticket on GitHub. Please make sure you provide some evidence on why the word should be added (like a VanDale.nl result).

Development

If you want to contribute to local development, please consult the local development page.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dutch-pluralizer-0.0.41.tar.gz (1.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dutch_pluralizer-0.0.41-py3-none-any.whl (1.7 MB view details)

Uploaded Python 3

File details

Details for the file dutch-pluralizer-0.0.41.tar.gz.

File metadata

  • Download URL: dutch-pluralizer-0.0.41.tar.gz
  • Upload date:
  • Size: 1.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.56.2 CPython/3.8.6

File hashes

Hashes for dutch-pluralizer-0.0.41.tar.gz
Algorithm Hash digest
SHA256 004c46f7cd3e94291b1cb83a6a6ea9e5f590b857551fb016afe67552023c8b46
MD5 83bf2375114ce328c07f96d8e4bfcd42
BLAKE2b-256 33aeb8512467cabd8814551a05c6e94386e11d5bbee6613b0a590fa41eccb00b

See more details on using hashes here.

File details

Details for the file dutch_pluralizer-0.0.41-py3-none-any.whl.

File metadata

  • Download URL: dutch_pluralizer-0.0.41-py3-none-any.whl
  • Upload date:
  • Size: 1.7 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.56.2 CPython/3.8.6

File hashes

Hashes for dutch_pluralizer-0.0.41-py3-none-any.whl
Algorithm Hash digest
SHA256 898d3346864569bb6278ecb269677981d9d9bab0929905da71fd4dc667120199
MD5 833f9cfc549db36bd58df8574c75389c
BLAKE2b-256 5eb3d0b4ff964307fc2b9114483fb62037dcc18936bab7fbd5ab4d0575c5f64c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page