Paradigm learning and paradigm prediction
Project description
NB!
This is Språkbanken's inofficial version of the paradigmextract library. The main version can be found here.
---*--- ---*--- ---*---
Paradigm learning and paradigm prediction
The software collection in this repository is related to a body of scientific work on paradigm learning and paradigm prediction, of which the following publication is the latest one. See the reference list for previous work.
[Forsberg, M; Hulden, M. (2016). Learning Transducer Models for Morphological Analysis from Example Inflections. In Proceedings of StatFSM. Association for Computational Linguistics.] (http://anthology.aclweb.org/W16-2405)
Quick reference
Paradigm learning: pextract.py
Description
Extract paradigmatic representations from input inflection tables. See Section 2 in Forsberg and Hulden (2016) for details.
Example
$ python src/pextract.py < data/es_verb_train.txt > es_verb.p
Non-probabilistic morphological analyzers: morphanalyzer.py
Description
Create a foma-compatible morphological analyzer from a paradigm file. The analyzer is non-probabilistic.
Options:
-o
recreate original data (all vars must be exactly instantiated as seen in training data)-c
constrain variables by generalizing (default pvalue = 0.05)-u
unconstrained (all variables are defined as ?+)-p
use together with -c-s
keep different analyzers separate instead of merging with priority union (may be necessary for some analyzers)-n
name of binary foma file to compile to
Any combination of the above may be used. The analyzers are combined
by priority union, e.g. -o -c -u
would yield an analyzer
[ Goriginal .P. Gconstrained .P. Gunconstrained ]
.
Example
$ python src/morphanalyzer.py -o -c es_verb.p > es_verb.foma
Probabilistic morphological analyzers: morphparser.py
Description
Create a probabilistic morphological analyzer from a paradigm file.
Reads one or more whitespace-separated words from STDIN and
returns the most plausible analysis for the set in the format:
SCORE NAME_OF_PARADIGM VARIABLES WORDFORM1:BASEFORM,MSD#WORDFORM2:BASEFORM,MSD...
Flags:
-k num
print the k best analyses-t
print the entire table for the best analysis-d
print debug info-n num
use an nth order ngram model for selecting best paradigm (an n-gram model for variables in the paradigm is used)
Example
$ echo "coger cojo" | python morphparser.py ./../paradigms/spanish_verbs.p -k 1 -t
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for paradigmextract-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cff7c446694aaa3086b5c43a5014a440f1e2d48ee0144c39606df59bf9220646 |
|
MD5 | 700c5c8a49aa5f522ee6f394f71002c7 |
|
BLAKE2b-256 | 5d7f8be962a4bd82adf9a092fbec24b0e034caf6fe6f8d21dda09cf026a4511e |