No project description provided

These details have not been verified by PyPI

Project description

FSTmorph

This repository houses a set of language-neutral tools for converting CSVs to a finite-state transducer (FST), as well as for testing that FST via YAML files.

User Instructions
The csv2lexc.py script
- JSON configuration files
- The external lexical database
Citation

User Instructions

Examples are given for Ojibwe, with the relevant language data accessible in other repos. However, this FST-generating code is intended to be applicable for other Algonquian languages and beyond -- if you have the necessary spreadsheets for your target language, it should be compatible with this code!

Getting set up to build the FST

Will need updating after repo restructuring!

You will need to install the foma compiler and flookup program (which is part of the foma toolkit). On Mac or Linux, the easiest way to install is via homebrew. Just use the command brew install foma. Alternatively, there are other installation instructions here (including for Windows users).

Note for Windows users: In addition to the page given above, we found these instructions useful for installing. Also, ensure that the directory you add to your PATH immediately contains foma.exe and flookup.exe. For example, if the path to foma.exe is C:\Program Files (x86)\Foma\win32\foma.exe, then add C:\Program Files (x86)\Foma\win32 (not C:\Program Files (x86)\Foma\) to your PATH.

This project uses poetry to manage python requirements. These can be installed as follows:

Navigate to ParserTools/csv2fst.
Create a virtual environment (using the python command that works on your system e.g., python instead of python3 if needed):

python3 -m venv myvenv

Activate the virtual environment.
On Mac or Linux, run:

source myvenv/bin/activate

On Windows, run:

source myvenv/Scripts/activate

Update pip and setuptools:

pip3 install -U pip setuptools

Install poetry:

pip3 install poetry

Use poetry to install the project into the virtual environment:

poetry install

Now, the dependencies are set up!

Eventually, when you're done running the virtual environment, you can close it via:

deactivate myvenv

Note: The Makefile in this directory begins with some variables you can change if you run into errors. For example, you can change the command for running python (on Windows, we had to change this from python3 to python).

If you're working towards building the example FST for Ojibwe, once you're done with installing these prerequisites, carry on with the instructions in OjibweMorph. But make sure you keep the virtual environment you set up here up and running!

Just change directories so that you end up back in your local copy of OjibweMorph, ready to run commands there, with this virtual environment still active. For example, if you installed OjibweMorph and ParserTools in the same directory, just use the command cd ../..OjibweMorph/ (because you're currently in ParserTools/csv2fst/).
If you get a ModuleNotFoundError when running commands in OjibweMorph, it may be that you are no longer in the virtual environment (and so the modules installed in the virtual environment are not accessible).
You can always check that the name of the virtual environment still appears in your command line prompt to know if it is active.

Building the FST

The FST is built using a Makefile. Before building, there are three variables within the Makefile which must be set to point to the right directory locations:

MORPHOLOGYSRCDIR points to a directory that contains most of the morphological information needed to build the FST. The example directory (for Border Lakes Ojibwe) is OjibweMorph.
LEMMAS_DIR points to a directory that contains CSVs listing all the lemmas that will be used to build the FST. An example directory (for Border Lakes Ojibwe) is OjibweLexicon/OPD.
- This variable can also be set to a list of directories (each containing CSVs to be used), separated by a comma.
OUTPUT_DIR points to the directory where all files generated will go. Note that in order for the testing code to work (i.e., when running make check), this must be a relative path (for some reason).

You should go into the Makefile and edit the values of these variables so that the correct directory is specified. Once complete, you can run make all (or just make) to build the FST (e.g., ojibwe.fomabin). This will create a directory generated which contains the FST, lexc files and XFST rules. Once complete, you can use the FST -- check here for an example.

Alternatively, rather than editing the Makefile contents, you can just specify the directory paths when you call make all. For example:

make all MORPHOLOGYSRCDIR=~/Documents/OjibweMorph LEMMAS_DIR=~/Documents/OjibweLexicon/OPD OUTPUT_DIR=../../OjibweMorph/FST

These variables should also be set when running make clean.

Additionally, there are two other variables that must be specified if you're going to run the tests for the FST:

SPREADSHEETS_FOR_YAML_DIR points to a directory which contains CSVs for running the YAML tests. An example directory (for Border Lakes Ojibwe) is OjibweLexicon/OPD/for_yaml.
PARADIGM_MAPS_DIR points to a directory which contains paradigm maps created for sorting words into different categories, but used here because they contain a complete list of 'Class' values that become the test sections (e.g., VAI_V, VAI_VV, etc.). An example directory (for Border Lakes Ojibwe) is OjibweLexicon/resources.

So when running make check to run the FST tests, you either need to edit all five variables right in the Makefile, or supply their values when calling make check:

make check MORPHOLOGYSRCDIR=~/Documents/OjibweMorph LEMMAS_DIR=~/Documents/OjibweLexicon/OPD SPREADSHEETS_FOR_YAML_DIR=~/Documents/OjibweLexicon/OPD/for_yaml PARADIGM_MAPS_DIR=~/Documents/OjibweLexicon/resources OUTPUT_DIR=../../OjibweMorph/FST

Also written into the Makefile are the expected names of many of these files (e.g., the paradigm map file for the verb tests of the FST being called VERBS_paradigm_map.csv), so if any of these names differ, the Makefile will have to be updated accordingly.

Running the YAML Tests

The code for running tests based on the generated YAML files comes from giella-core. A version of their morph-test.py script is included in this repo (as run_yaml_tests.py), modified to customize the .log output format.

Run the tests with make check. This will generate three log files:

paradigm-test.log: A smaller set of tests covering the noun and verb spreadsheets in OjibweMorph.
opd-test.log: A larger set of tests covering an external lexical resource, the OPD.

The `csv2lexc.py` script

Usage: csv2lexc.py [OPTIONS]

Options:
  --config-file TEXT              JSON config file  [required]
  --lexc-path TEXT                Directory where output lexc files are stored [required]
  --read-lexical-database BOOLEAN
                                  Whether to include lexemes from an external
                                  lexicon database
  --help                          Show this message and exit.

JSON configuration files

You need to specify a JSON configuration file which controls the generation of lexc files. See verbs.json in the OjibweMorph repository for an example.

You need to specify the following parameters:

Parameter	Description	Example
`comments`	Comments.	`"This is a comment"`
`source_path`	Directory where source CSV files reside.	`"~/Documents/OjibweMorph/VerbSpreadsheets/"`
`regular_csv_files`	List of CSV files which contain regular paradigms (please omit `.csv` suffix)	`["VAI_IND","VTA_CNJ",...]`
`irregular_csv_files`	List of CSV files which contain irregular paradigms (please omit `.csv` suffix)	`["VAI_IRR"]`
`lexical_database`	External CSV lexical database file.	`"VERBS.csv"`
`regular_lexc_file`	Filename for generated lexc file containing regular paradigms.	`"ojibwe_verbs_regular.lexc"`
`irregular_lexc_file`	Filename for generated lexc file containing irregular paradigms.	`"ojibwe_verbs_irregular.lexc"`
`morph_features`	This field specifies the order in which morphological features are realized in FST output fields. The elements in the list have to match columns of the spurce CSV files.	`["Paradigm", "Order", "Negation", "Mode", "Subject", "Object"]`
`missing_tag_marker`	Tag which indicates missing values of features in the source CSV files. E.g. intransitive verbs won't have an object, and this tag is used to mark that fact.	`"NA"`
`missing_form_marker`	Tag which indicates paradigm gaps	`"MISSING"`
`multichar_symbols`	List of multi-character symbols which are used in the source CSV files	`["i1", "w1"]`
`pre_element_tag`	A tag which is used to indicate the position of pre-elements like preverbs and prenouns in the lexc files	`"[PREVERB]"`
`pv_source_path`	This should point to your `PVSpreadsheets` directory	`"~/Documents/OjibweMorph/PVSpreadsheets/"`
`template_path`	Path to jinja2 templates. Note that (for some reason) this file path cannot use a tilde symbol.	`"/Users/YourName/Documents/OjibweMorph/templates"`

The external lexical database

Additional lexemes are supplied in CSV files. For example, here are the first few rows of the lexical database file of verbs for Border Lakes Ojibwe, sourced from the Ojibwe People's Dictionary (OPD):

Lemma,Stem,Paradigm,Class,Translation,Source
aazhoogaadebi,aazhoogaadebi,VAI,VAI_V,NONE,https://ojibwe.lib.umn.edu/main-entry/aazhoogaadebi-vai
aayaazhoogaadebi,aayaazhoogaadebi,VAI,VAI_V,NONE,https://ojibwe.lib.umn.edu/main-entry/aazhoogaadebi-vai
aazhooshkaa,aazhooshkaa,VAI,VAI_VV,NONE,https://ojibwe.lib.umn.edu/main-entry/aazhooshkaa-vai

The lemma and stem must agree with the forms in the source spreadsheets where applicable (stored in source_path as specified above).

In the JSON configuration file, the path to the lexical database is supplied under the key lexical_database.

Citation

To cite this work or the contents of the repository in an academic work, please use the following:

Hammerly, C., Livesay, N., Arppe A., Stacey, A., & Silfverberg, M. (Submitted) OjibweMorph: An approachable morphological parser for Ojibwe

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.2

Jul 21, 2025

0.1.1

Jul 1, 2025

This version

0.1.0

Jun 29, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fstmorph-0.1.0.tar.gz (38.1 kB view details)

Uploaded Jun 29, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

fstmorph-0.1.0-py3-none-any.whl (40.6 kB view details)

Uploaded Jun 29, 2025 Python 3

File details

Details for the file fstmorph-0.1.0.tar.gz.

File metadata

Download URL: fstmorph-0.1.0.tar.gz
Upload date: Jun 29, 2025
Size: 38.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.2 CPython/3.10.7 Darwin/21.6.0

File hashes

Hashes for fstmorph-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`f1436408648c634cf0d7ce1e745837505235a8b37a48f2e664a05d65b101e961`
MD5	`80c92251bde62388b0980c5c1c53ffb0`
BLAKE2b-256	`4a04300d071f2b83244633abda4aca7afab3110bbe9b6f1538c720fc8b3dcfd8`

See more details on using hashes here.

File details

Details for the file fstmorph-0.1.0-py3-none-any.whl.

File metadata

Download URL: fstmorph-0.1.0-py3-none-any.whl
Upload date: Jun 29, 2025
Size: 40.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.2 CPython/3.10.7 Darwin/21.6.0

File hashes

Hashes for fstmorph-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`26468f800092a831fdb573207caa48ae3f546da89455c91050a5ce0ec50586da`
MD5	`f280f360fe7444f8883454bb3676c705`
BLAKE2b-256	`fc92e549c53d6520756234982f422ea2120b3bbc6c20c95c4c0d69d006d54fcd`

See more details on using hashes here.

FSTmorph 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

FSTmorph

Contents

User Instructions

Getting set up to build the FST

Building the FST

Running the YAML Tests

The `csv2lexc.py` script

JSON configuration files

The external lexical database

Citation

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

FSTmorph 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

FSTmorph

Contents

User Instructions

Getting set up to build the FST

Building the FST

Running the YAML Tests

The csv2lexc.py script

JSON configuration files

The external lexical database

Citation

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

The `csv2lexc.py` script