Lookup FOMA FSTs
Project description
FST Lookup
Implements lookup for FOMA format finite state transducers.
Supports Python 3.5 and up.
Install
pip install fst-lookup
Usage
Import the library, and load an FST from a file:
Hint: Test this module by downloading the
eat
FST!
>>> from fst_lookup import FST
>>> fst = FST.from_file('eat.fomabin')
Assumed format of the FSTs
fst_lookup
assumes that the lower label corresponds to the surface
form, while the upper label corresponds to the lemma, and linguistic
tags and features: e.g., your LEXC
will look something like
this---note what is on each side of the colon (:
):
Multichar_Symbols +N +Sg +Pl
Lexicon Root
cow+N+Sg:cow #;
cow+N+Pl:cows #;
goose+N+Sg:goose #;
goose+N+Pl:geese #;
sheep+N+Sg:sheep #;
sheep+N+Pl:sheep #;
If your FST has labels on the opposite sides, you must invert the net
before loading it into fst_lookup
.
Analyze a word form
To analyze a form (take a word form, and get its linguistic analyzes)
call the analyze()
function:
def analyze(self, surface_form: str) -> Iterator[Analysis]
This will yield all possible linguistic analyses produced by the FST.
An analysis is a tuple of strings. The strings are either linguistic tags, or the lemma (base form of the word).
FST.analyze()
is a generator, so you must call list()
to get a list.
>>> list(sorted(fst.analyze('eats')))
[('eat', '+N', '+Mass'),
('eat', '+V', '+3P', '+Sg')]
Generate a word form
To generate a form (take a linguistic analysis, and get its concrete
word forms), call the generate()
function:
def generate(self, analysis: str) -> Iterator[str]
FST.generate()
is a Python generator, so you must call list()
to get
a list.
>>> list(fst.generate('eat+V+Past')))
['ate']
License
Copyright © 2019 Eddie Antonio Santos. Released under the terms of the
Apache license. See LICENSE
for more info.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for fst_lookup-2019.3.25-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ac6d173fba9e2c33a7b8f6c6e120367f26a21f9add825aa188c3d6ed7057ed63 |
|
MD5 | 813f7c157b83f2ba4575c2cc413bad4b |
|
BLAKE2b-256 | 1b8b84641d31190584b80f4bff32100a23e03c78f62e03eb3d6a87d4d34ff1a6 |