Skip to main content

Lookup Foma FSTs

Project description

FST Lookup

Tests codecov PyPI version calver YYYY.MM.DD

Implements lookup for Foma finite state transducers.

Supports Python 3.5 and up.

Install

pip install fst-lookup

Usage

Import the library, and load an FST from a file:

Hint: Test this module by downloading the eat FST!

>>> from fst_lookup import FST
>>> fst = FST.from_file('eat.fomabin')

Assumed format of the FSTs

fst_lookup assumes that the lower label corresponds to the surface form, while the upper label corresponds to the lemma, and linguistic tags and features: e.g., your LEXC will look something like this—note what is on each side of the colon (:):

Multichar_Symbols +N +Sg +Pl
Lexicon Root
    cow+N+Sg:cow #;
    cow+N+Pl:cows #;
    goose+N+Sg:goose #;
    goose+N+Pl:geese #;
    sheep+N+Sg:sheep #;
    sheep+N+Pl:sheep #;

If your FST has labels on the opposite sides—e.g., the upper label corresponds to the surface form and the upper label corresponds to the lemma and linguistic tags—then instantiate the FST by providing the labels="invert" keyword argument:

fst = FST.from_file('eat-inverted.fomabin', labels="invert")

Hint: FSTs originating from the HFST suite are often inverted, so try to loading the FST inverted first if .generate() or .analyze() aren't working correctly!

Analyze a word form

To analyze a form (take a word form, and get its linguistic analyzes) call the analyze() function:

def analyze(self, surface_form: str) -> Iterator[Analysis]

This will yield all possible linguistic analyses produced by the FST.

An analysis is a tuple of strings. The strings are either linguistic tags, or the lemma (base form of the word).

FST.analyze() is a generator, so you must call list() to get a list.

>>> list(sorted(fst.analyze('eats')))
[('eat', '+N', '+Mass'),
 ('eat', '+V', '+3P', '+Sg')]

Generate a word form

To generate a form (take a linguistic analysis, and get its concrete word forms), call the generate() function:

def generate(self, analysis: str) -> Iterator[str]

FST.generate() is a Python generator, so you must call list() to get a list.

>>> list(fst.generate('eat+V+Past')))
['ate']

Contributing

If you plan to contribute code, it is recommended you use Poetry. Fork and clone this repository, then install development dependencies by typing:

poetry install

Then, do all your development within a virtual environment, managed by Poetry:

poetry shell

Type-checking

This project uses mypy to check static types. To invoke it on this package, type the following:

mypy -p fst_lookup

Running tests

To run this project's tests, we use py.test:

poetry run pytest

C Extension

Building the C extension is handled in build.py

To disable building the C extension, add the following line to .env:

export FST_LOOKUP_BUILD_EXT=False

(by default, this is True).

To enable debugging flags while working on the C extension, add the following line to .env:

export FST_LOOKUP_DEBUG=TRUE

(by default, this is False).

Fixtures

If you are creating or modifying existing test fixtures (i.e., mostly pre-built FSTs used for testing), you will need the following dependencies:

Fixtures are stored in tests/data/. Here, you will use make to compile all pre-built FSTs from source:

make

License

Copyright © 2019–2021 National Research Council Canada.

Licensed under the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fst_lookup-2024.7.3.tar.gz (17.6 kB view details)

Uploaded Source

Built Distribution

fst_lookup-2024.7.3-cp38-cp38-macosx_14_0_arm64.whl (18.4 kB view details)

Uploaded CPython 3.8 macOS 14.0+ ARM64

File details

Details for the file fst_lookup-2024.7.3.tar.gz.

File metadata

  • Download URL: fst_lookup-2024.7.3.tar.gz
  • Upload date:
  • Size: 17.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.4 Darwin/23.4.0

File hashes

Hashes for fst_lookup-2024.7.3.tar.gz
Algorithm Hash digest
SHA256 fe5907921c868c4872985ac9644babb11177d4ce01090265d2f45172e5ee4701
MD5 81d7a466bc67de54d84edf42d0878464
BLAKE2b-256 ba6932254dd69be5fa111e2323433c22066478fae3d4e6b347fa19e660355474

See more details on using hashes here.

File details

Details for the file fst_lookup-2024.7.3-cp38-cp38-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for fst_lookup-2024.7.3-cp38-cp38-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 6324a0cb6f45a79251a54da277a60c07915cf66c1dbec41d7da7645fa99f5d4d
MD5 1112bbc69e6c0b8362507e8b7f411c55
BLAKE2b-256 c7e98e31f377ab3398b134208d98857219c7a0d83ca1b45ad3aadb9661d0e3b1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page