Lookup FOMA FSTs
Project description
FST Lookup
Implements lookup for FOMA format finite state transducers.
Supports Python 3.5 and up.
Install
pip install fst-lookup
Usage
Import the library, and load an FST from a file:
Hint: Test this module by downloading the
eat
FST!
>>> from fst_lookup import FST
>>> fst = FST.from_file('eat.fomabin')
Assumed format of the FSTs
fst_lookup
assumes that the lower label corresponds to the surface
form, while the upper label corresponds to the lemma, and linguistic
tags and features: e.g., your LEXC
will look something like
this---note what is on each side of the colon (:
):
Multichar_Symbols +N +Sg +Pl
Lexicon Root
cow+N+Sg:cow #;
cow+N+Pl:cows #;
goose+N+Sg:goose #;
goose+N+Pl:geese #;
sheep+N+Sg:sheep #;
sheep+N+Pl:sheep #;
If your FST has labels on the opposite sides--e.g., the upper label
corresponds to the surface form and the upper label corresponds to
the lemma and linguistic tags---then instantiate the FST by providing
the labels="invert"
keyword argument:
fst = FST.from_file('eat-inverted.fomabin', labels="invert")
Hint: FSTs originating from the HFST suite are often inverted, so try to loading the FST inverted first if
.generate()
or.analyze()
aren't working correctly!
Analyze a word form
To analyze a form (take a word form, and get its linguistic analyzes)
call the analyze()
function:
def analyze(self, surface_form: str) -> Iterator[Analysis]
This will yield all possible linguistic analyses produced by the FST.
An analysis is a tuple of strings. The strings are either linguistic tags, or the lemma (base form of the word).
FST.analyze()
is a generator, so you must call list()
to get a list.
>>> list(sorted(fst.analyze('eats')))
[('eat', '+N', '+Mass'),
('eat', '+V', '+3P', '+Sg')]
Generate a word form
To generate a form (take a linguistic analysis, and get its concrete
word forms), call the generate()
function:
def generate(self, analysis: str) -> Iterator[str]
FST.generate()
is a Python generator, so you must call list()
to get
a list.
>>> list(fst.generate('eat+V+Past')))
['ate']
License
Copyright © 2019 Eddie Antonio Santos. Released under the terms of the
Apache license. See LICENSE
for more info.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for fst_lookup-2019.7.17-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7c55b520942ab890d68ebfce3b50b0104e507b87f8c85e787185d5231cd10cbb |
|
MD5 | 16120753dd66433871491d9051432e21 |
|
BLAKE2b-256 | 68a00686462fd4c355f7a3020d0688a70c88daf6f5ea0d2fc81c7487a0c9427b |