Library for word declensions

These details have not been verified by PyPI

Project links

Development Status
- 5 - Production/Stable
Intended Audience
- Developers
- Science/Research
License
- OSI Approved :: MIT License

Project description

Declensor library

Using dclua.py library you can train declension models and decline words. This will just replace suffix of the word to correspond new morphological properties you want the word to have. Here's some topics that will help you understand how it works.

Morphological vector

Morphological vector is a vector which determines morphology properties for the lexeme. Number on each coordinate determine some property. You can use your vectors for your language, but here's the structure, which is suggested to use for Ukrainian.

Noun vectors

Noun vectors has 2 coordinates: [number][case]. Here's the table, what means each value.

Coordinate	Number	Case
0		Nominative
1	Singular	Genitive
2	Plural	Dative
3		Accusative
4		Instrumental
5		Locative
6		Vocative

Infinitive suffix placed at [0][0].

Verbs vectors

Noun vectors has 4 coordinates: [tense][person][number][gender].

Coordinate	Tense	Person	Number	Gender
0		First	Singular	Masculine
1	Present	Second	Plural	Feminine
2	Future	Third		Neutral
3	Past

Infinitive suffix placed at [0][0][0][0].

Adjective vectors

Noun vectors has 3 coordinates: [gender][number][person].

Coordinate	Gender	Number	Person
0		Singular	First
1	Masculine	Plural	Second
2	Feminine		Third
3	Neutral

Infinitive suffix placed at [0][0][0].

Declension rule

Declension rule is a multidimensional array which contains declensed suffixes, which is indexed using morphology vectors. You can create such model for your word in this way:

rule = dclua.DeclenseTrainer.analyze({
  (0,0): 'усмішка',
  (1,0): 'усмішка',
  (1,1): 'усмішки',
  (1,2): 'усмішці',
  #...
  (2,6): 'усмішки'
});

Now the rule will look like this:

rule[0][0] == 'ка'
rule[1][0] == 'ка'
rule[1][1] == 'ки'
rule[1][2] == 'ці'
#...
rule[2][6] == 'ки'

Every word has its suffix, so you need to create rule for each of them in order to use in the future.

analyze method also accept minsize argument, which determine size of the minimal producing suffix.

Word declension

Once you have model (bundle of rules) for different suffixes, you can use them to decline words. The syntax is following:

Declensor.declense(str word, tuple newmporph, tuple morphology=None)

Suppose, you have model variable, which contains models for all suffixes we want. Then you can decline words in the following way:

>>> dcl = dclua.Declensor(model)
>>> dcl.declense('сонцю', (1,1))
<<< 'сонця'

The morphology vector of given word will be recognized automatically, so it may take some time to found appropriate declension model in models. If you already know the morphology of the word you want to declense, assign it to the morphology argument:

>>> dcl = dclua.Declensor(model)
>>> dcl.declense('сонцю', (1,1), morphology=(1,2))
<<< 'сонця'

Train your model

In order to train your model you can use template from template.py in this directory.

Generalizing model

Sometimes suffix in a model can appear in slight variations. For example, aab, aac: only the last letter is different. You can set up groups of letters, which can differ in such cases, and generalize your model according to this groups. Example of using:

>>> dclua.DeclenseTrainer.generalizeModel(
...    model = [
...        [["она"], ["они"], ["онів"]],
...        [["ова"], ["ови"], ["овів"]],
...    ],
...    groups = [
...        ["н", "в", "п", "м"]
...    ],
...    threshold=.3
... )
...
<<< [
...     [
...         [['она'], ['они'], ['онів']],
...         [['ова'], ['ови'], ['овів']],
...         [['опа'], ['опи'], ['опів']],
...         [['ома'], ['оми'], ['омів']]
...     ]
... ]

Threshold parameter is a ratio between amount of rules, which can be generalized to some group and size of that group. It's equal to .3 by default, so if there are less then .3 * size_of_group rules, they won't be generalized.

Project details

These details have not been verified by PyPI

Project links

Development Status
- 5 - Production/Stable
Intended Audience
- Developers
- Science/Research
License
- OSI Approved :: MIT License

Release history Release notifications | RSS feed

This version

2.0

May 24, 2019

1.1

May 19, 2019

1.0

May 18, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dclua-2.0-py2.py3-none-any.whl (8.8 kB view details)

Uploaded May 24, 2019 Python 2Python 3

File details

Details for the file dclua-2.0-py2.py3-none-any.whl.

File metadata

Download URL: dclua-2.0-py2.py3-none-any.whl
Upload date: May 24, 2019
Size: 8.8 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.12.1 pkginfo/1.5.0.1 requests/2.20.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.31.0 CPython/3.6.4

File hashes

Hashes for dclua-2.0-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`595c022835a40e3e8461941b359d284634e5d3d524f5550e2a136337785518d8`
MD5	`13575a5af68cf1605665eeab67023612`
BLAKE2b-256	`76fba82d5f3132c736b69d44c5d20d5713a2a1375f0099da673998c98f31d535`

See more details on using hashes here.

dclua 2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Declensor library

Morphological vector

Noun vectors

Verbs vectors

Adjective vectors

Declension rule

Word declension

Train your model

Generalizing model

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes