Tools for simplified two-level morphology

These details have not been verified by PyPI

Project links

Homepage

Project description

twol: Compiler and other tools for two-level morphology

These tools can be installed to your computer from PyPi by normal commands, such as:

pip3 install --user twol

For more instructions, see TWO Wiki https://github.com/koskenni/twol/wiki

NOTE: the programs are under development and some of them may not work as expected, furthermore, the command-line parameters may still change. Thus, you should check the names and the meanigs of the parameters by using the "-h" option.

This repository contains various tools for Simplified Two-level morphology which is a revised form of the original two-level morphology as implemented in hfst-twolc (see https://github.com/hfst/hfst/wiki/HfstTwolc). The tools are implemented in Python and many of them use the HFST finite-state transducer tools, especially its Python version (see https://github.com/hfst/python).

The tools in this repository include:

A compiler twol.py or twol-comp which reads in a set of examples and a grammar file containing two-level rules. The compiler parses the rules, compiles them and tests them against the examples. The compiler can write the compiled rules as binary finite-state transducers into a file which can be used with the HFST command line tools.
Methods for aligning words or stems. These are useful for defining underlying representations of lexical entries. Morphophonemes in the entries are a result of the alignment process.
Documentation of the methods and the programs. The source text for documentation is in the docs directory and a human readable set of interlinked documents is available at Readthedocs (https://pytwolc.readthedocs.io)

twol-comp: Compiler and rule tester for two-level rules

The compiler is based on a well-chosen set of examples against which the rules will be immediately tested. Rules have no significance before we have the examples. One way to produce a set of two-level examples is the alignment programs which are described later on in this file. The idea of simplified two-level model is described in https://pytwolc.readthedocs.io/en/latest/intro.html and the use of the twol-comp program is described in https://pytwolc.readthedocs.io/en/latest/compiletest.html and some other chapters there.

Letter by letter alignment of words

Methods for careful letter by letter alignment for e.g. cognate words in historical linguistics or when matching different stems of a words with each other. Alignment adds zero symbols where necessary in order to match words or stems that differ in length. Alignment is particularly important in two-level morphology because the alignment determines what morphophonemes there will be.

In the present context, alignment is the process of inserting some zero symbols in the words so that the letters or phonemes in the corresponding positions of the words are phonologically as similar as possible, e.g. a Finnish word "kieli" and an Estonian word "keel" could be aligned by inserting a zero symbol 'Ø':

k i e l i
k e e l Ø

Now there are pairs of identical phonemes (k:k, e:e and l:l) and one modification of a vowel (i:e) and the deletion of a word-final vowel (i:Ø).

There are stand-alone Python 3 programs which can be used from command-line for aligning individual words:

twol-aligner and metrics.py with which you can compare cognate words of two languages. The latter reads an alphabet definition and writes a weighted finite-state transducer (WFST) which the former program needs for the concrete alignment.
twol-multialign compares two or more corresponding words or morphs and aligns them.

There is a suite of stand-alone programs for building morphophonemic representations of morphemes. The input consists of inflected word forms given as a table where individual cells contain the word forms where morph boundaries are indicated. The forms with the same stem are given as a row of the table and different forms correspond to the columns of the table. The programs are:

twol-table2words reads in a table in CSV (Comma Separated Values) format and writes it in a one word form per line CSV format.
twol-words2zerofilled reads in the output of the above program and aligns the morphs, i.e. stems of the same lexeme with each other and alternate forms of affixes of the same grammatical form with each other. Aligned result is a table where the morphs include the optimally inserted zeros as an additional column in the CSV format file.
twol-zerofilled2raw reads in the output of the above program and produces an additional column which contains raw morphophonemic forms of each morpheme.
twol-raw2named reads in the output of the above program and a table of user-given shorter names for some raw morphophonemes and writes out the examples as two-level symbol pairs, one example per line. The examples now consist of a sequence of symbol pairs where the first component of a pair is the morphophoneme and the second component is the surface character. This file is used by the two-level compiler in conjunction with the rules which the linguist now can start to design.

More information on these programs can be found at: https://pytwolc.readthedocs.io/en/latest/morphophon.html and by starting the programs with a --help option.

Licenses

The programs in this project are written by Kimmo Koskenniemi alone and he has the copyright to these programs. The programs are free software according to the GNU General Public License Version 3, 29 June 2007, see LICENSE.txt in this repository or https://www.gnu.org/licenses/gpl-3.0.en.html for the full text of the license.

The file tyveb-n-stems.text is derived from a file available at the Institute of Estonian Language (IEL) https://www.eki.ee/tarkvara/perlmorf/tyvebaas.pmf. The license for the file can be seen at https://www.eki.ee/eki/licence.html

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.9.1

Dec 2, 2025

0.9

Nov 11, 2025

0.8.0

Nov 5, 2023

0.7.7

Dec 7, 2021

0.7.4

Mar 3, 2021

0.7.3

Feb 22, 2021

0.7.1

Feb 11, 2021

0.7.0

Feb 11, 2021

0.6.5

Feb 11, 2021

0.6.4

Nov 1, 2020

0.6.1

Feb 21, 2020

0.6

Feb 20, 2020

0.5.1

Feb 12, 2020

0.5

Feb 12, 2020

0.5.dev2 pre-release

Feb 12, 2020

0.5.dev1 pre-release

Feb 8, 2020

0.4

Feb 7, 2020

0.3

Feb 7, 2020

0.2.dev3 pre-release

Feb 7, 2020

0.1

Feb 2, 2020

0.1.dev2 pre-release

Jan 28, 2020

0.1.dev1 pre-release

Jan 28, 2020

0.0.23.dev0 pre-release

Jan 28, 2020

0.0.22

Jan 28, 2020

0.0.18

Jan 21, 2020

0.0.15

Jan 19, 2020

0.0.14

Jan 19, 2020

0.0.13

Jan 19, 2020

0.0.12

Jan 18, 2020

0.0.11

Jan 18, 2020

0.0.10

Jan 18, 2020

0.0.9

Jan 18, 2020

0.0.8

Jan 17, 2020

0.0.7

Jan 17, 2020

0.0.6

Jan 16, 2020

0.0.5

Jan 16, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

twol-0.9.1.tar.gz (68.5 kB view details)

Uploaded Dec 2, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

twol-0.9.1-py3-none-any.whl (78.4 kB view details)

Uploaded Dec 2, 2025 Python 3

File details

Details for the file twol-0.9.1.tar.gz.

File metadata

Download URL: twol-0.9.1.tar.gz
Upload date: Dec 2, 2025
Size: 68.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for twol-0.9.1.tar.gz
Algorithm	Hash digest
SHA256	`65e6eff8a66c27d94aa0372e5fb76d8541bd50063281dda437b102a014c0b0c1`
MD5	`e0639ad0808c82972ed0857510431d18`
BLAKE2b-256	`748eebad26b401a4ceee42319370f99fb511edce7368b6aa9862b1992cebfe04`

See more details on using hashes here.

File details

Details for the file twol-0.9.1-py3-none-any.whl.

File metadata

Download URL: twol-0.9.1-py3-none-any.whl
Upload date: Dec 2, 2025
Size: 78.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for twol-0.9.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`324f8095a5bda5fd8c29b0e78a48506add1c021c2dbfc8bd434786f928124178`
MD5	`b40eb8d2093e76a2035d992d152f0cc4`
BLAKE2b-256	`53f24062c9e5022314c582ac833ceefc2a8fd18aed95474567f2a4512aab47d7`

See more details on using hashes here.

twol 0.9.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

twol: Compiler and other tools for two-level morphology

twol-comp: Compiler and rule tester for two-level rules

Letter by letter alignment of words

Licenses

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes