Generates Dutch plural and singular nouns in a very imperfect way using Hunspell dictionaries. Why imperfect? Because the Dutch language is full of exceptions.
Project description
Dutch Noun Pluralizer in Python
Generates plural and singular nouns in a very imperfect way using CyHunspell and OpenTaal dictionaries and word lists. Why imperfect? Because the Dutch language is full of exceptions.
The algorithm is based on the document "Basismorfologie. Het meervoud in het Nederlands" (Dutch) of the Université catholique de Louvain.
Note: I'm a .NET developer that does Python in my free time. I'm not a linguist, I just work for a Dutch company. Hence: this must be a very imperfect way of doing this. If you have good ideas, I welcome them, just open an issue.
Installation
Install from PIP:
pip install dutch-pluralizer
Note on Windows 10
CyHunspell is used. To use this package on Windows 10, you might need to install Build Tools for Visual Studio 2019 and choose the Windows 10 C++ SDK option.
Note on other Linux installations
Please check how you can install Hunspell on Linux if you don't want build Hunspell.
CLI usage
The project can be used as a CLI tool:
usage: dutch_pluralizer [-h] [-p] [-s] [-pa] [-v] word
Generates Dutch plural and singular nouns in a very imperfect way using Hunspell
dictionaries. Why imperfect? Because the Dutch language is full of exceptions.
positional arguments:
word The word.
optional arguments:
-h, --help show this help message and exit
-p, --pluralize pluralizes the word.
-s, --singularize singularizes the word.
-pa, --pluralize_advanced
shows advanced pluralization output.
-v, --verbose Shows an error message when a word could not be processed.
API
The API can be used like this:
from dutch_pluralizer import pluralize, singularize
# pluralize will return the result or None
assert pluralize("kaas") == "kazen"
assert pluralize("kazen") == None
# singularize will return the result or None
assert singularize("kazen") == "kaas"
assert singularize("kaas") == None
Advanced pluralization will give you more options:
from dutch_pluralizer import pluralize, pluralize_advanced, singularize
adv = pluralize_advanced("album")
# the plural
assert adv.plural == 'albums'
# what the algorithm (without Hunspell) created
# is probably not correct, that's why Hunspell is
# used on it. It is like a preprocessing:
assert adv.algorithmic_plural == 'alba'
# indicates that end result was found in Hunspell
adv.hunspell_spelled = True
# the plural was found by replacement of
# 'a' to 'ums'
assert adv.switched_ending_from == 'a'
assert adv.switched_ending_to == 'ums'
# suggestions given by Hunspell when the algorithmic
# result was processed:
assert adv.suggestions == ( 'Alba',
'aba',
'balba',
'albe',
'alia',
'alla',
'alma',
'alfa',
'Elba')
Add custom words to the dictionary:
from dutch_pluralizer import pluralize, singularize
from dutch_pluralizer.speller import ensure_hunspell_nl
def test_readme_example_3():
# default dictionary does not understand these words,
# as they are not Dutch
assert pluralize("fibulatie") == None
assert singularize("fibulaties") == None
# add the words to the dictionary
h = ensure_hunspell_nl()
h.add("fibulatie")
h.add("fibulaties")
# check again
assert pluralize("fibulatie", speller=h) == "fibulaties"
assert singularize("fibulaties", speller=h) == "fibulatie"
Help!? The result is not correct
I told you it was imperfect! There is stuff this package can and cannot do:
- We cannot discover words that are not recognized by Hunspell
- We can only process nouns (Dutch: zelfstandige naamwoorden)
- We can only return a single result, but we know that the singular of graven can be either graaf or graf. We currently have no support for these use cases.
- We can add words, just open up a ticket on GitHub. Please make sure you provide some evidence on why the word should be added (like a VanDale.nl result).
Development
If you want to contribute to local development, please consult the local development page.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for dutch_pluralizer-0.0.41-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 898d3346864569bb6278ecb269677981d9d9bab0929905da71fd4dc667120199 |
|
MD5 | 833f9cfc549db36bd58df8574c75389c |
|
BLAKE2b-256 | 5eb3d0b4ff964307fc2b9114483fb62037dcc18936bab7fbd5ab4d0575c5f64c |