Lookup Foma FSTs
Project description
FST Lookup
Implements lookup for Foma finite state transducers.
Supports Python 3.5 and up.
Install
pip install fst-lookup
Usage
Import the library, and load an FST from a file:
Hint: Test this module by downloading the
eat
FST!
>>> from fst_lookup import FST
>>> fst = FST.from_file('eat.fomabin')
Assumed format of the FSTs
fst_lookup
assumes that the lower label corresponds to the surface
form, while the upper label corresponds to the lemma, and linguistic
tags and features: e.g., your LEXC
will look something like
this—note what is on each side of the colon (:
):
Multichar_Symbols +N +Sg +Pl
Lexicon Root
cow+N+Sg:cow #;
cow+N+Pl:cows #;
goose+N+Sg:goose #;
goose+N+Pl:geese #;
sheep+N+Sg:sheep #;
sheep+N+Pl:sheep #;
If your FST has labels on the opposite sides—e.g., the upper label
corresponds to the surface form and the upper label corresponds to
the lemma and linguistic tags—then instantiate the FST by providing
the labels="invert"
keyword argument:
fst = FST.from_file('eat-inverted.fomabin', labels="invert")
Hint: FSTs originating from the HFST suite are often inverted, so try to loading the FST inverted first if
.generate()
or.analyze()
aren't working correctly!
Analyze a word form
To analyze a form (take a word form, and get its linguistic analyzes)
call the analyze()
function:
def analyze(self, surface_form: str) -> Iterator[Analysis]
This will yield all possible linguistic analyses produced by the FST.
An analysis is a tuple of strings. The strings are either linguistic tags, or the lemma (base form of the word).
FST.analyze()
is a generator, so you must call list()
to get a list.
>>> list(sorted(fst.analyze('eats')))
[('eat', '+N', '+Mass'),
('eat', '+V', '+3P', '+Sg')]
Generate a word form
To generate a form (take a linguistic analysis, and get its concrete
word forms), call the generate()
function:
def generate(self, analysis: str) -> Iterator[str]
FST.generate()
is a Python generator, so you must call list()
to get
a list.
>>> list(fst.generate('eat+V+Past')))
['ate']
Contributing
If you plan to contribute code, it is recommended you use Poetry. Fork and clone this repository, then install development dependencies by typing:
poetry install
Then, do all your development within a virtual environment, managed by Poetry:
poetry shell
Type-checking
This project uses mypy
to check static types. To invoke it on this
package, type the following:
mypy -p fst_lookup
Running tests
To run this project's tests, we use py.test
:
poetry run pytest
C Extension
Building the C extension is handled in build.py
To disable building the C extension, add the following line to .env
:
export FST_LOOKUP_BUILD_EXT=False
(by default, this is True
).
To enable debugging flags while working on the C extension, add the
following line to .env
:
export FST_LOOKUP_DEBUG=TRUE
(by default, this is False
).
Fixtures
If you are creating or modifying existing test fixtures (i.e., mostly pre-built FSTs used for testing), you will need the following dependencies:
- GNU
make
- Foma
Fixtures are stored in tests/data/
. Here, you will use make
to
compile all pre-built FSTs from source:
make
License
Copyright © 2019–2021 National Research Council Canada.
Licensed under the MIT license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file fst_lookup-2024.7.3.tar.gz
.
File metadata
- Download URL: fst_lookup-2024.7.3.tar.gz
- Upload date:
- Size: 17.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.12.4 Darwin/23.4.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fe5907921c868c4872985ac9644babb11177d4ce01090265d2f45172e5ee4701 |
|
MD5 | 81d7a466bc67de54d84edf42d0878464 |
|
BLAKE2b-256 | ba6932254dd69be5fa111e2323433c22066478fae3d4e6b347fa19e660355474 |
File details
Details for the file fst_lookup-2024.7.3-cp38-cp38-macosx_14_0_arm64.whl
.
File metadata
- Download URL: fst_lookup-2024.7.3-cp38-cp38-macosx_14_0_arm64.whl
- Upload date:
- Size: 18.4 kB
- Tags: CPython 3.8, macOS 14.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.12.4 Darwin/23.4.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6324a0cb6f45a79251a54da277a60c07915cf66c1dbec41d7da7645fa99f5d4d |
|
MD5 | 1112bbc69e6c0b8362507e8b7f411c55 |
|
BLAKE2b-256 | c7e98e31f377ab3398b134208d98857219c7a0d83ca1b45ad3aadb9661d0e3b1 |