No project description provided
Project description
FstStr
FstStr is a small library providing a string-oriented Python interface to
OpenFST. It is build on the pywrapfst
library that is distributed with
OpenFST.
Usage
FstStr includes several types of functions that make working with strings in OpenFST more comfortable. These include defining SymbolTables, applying FSTs to strings, and several component steps.
Working with symbols and SymbolTables
SymbolTables define a mapping between integer indices and the input/output
alphabet of an FST. An example alphabet for English (EN_SYMB
) is included in fststr.
>>> from fststr import fststr
>>> fststr.EN_SYMB
['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i' 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '-', "'", "'", '+Known', '+Guess', '<other>', '<c>', '<v>']
To convert this alphabet to a symbol table, use symbol_table_from_alphabet
:
>>> st = fststr.symbols_table_from_alphabet(fststr.EN_SYMB)
This symbol table can then be passed to an FST compiler as the input/output symbols tables for an FST.
Compiling and manipulating FSTs
FstStr currently provides no abstraction over the process of defining and
compiling FSTs, but does provide some functions for maniuplating FSTs once they
are compiled. To compile an FST, instantiate a compiler (using the symbol table
st
for both input and output):
>>> import pywrapfst as fst
>>> compiler = fst.Compiler(isymbols=st, osymbols=st, keep_isymbols=True, keep_osymbols=True)
The resulting object, compiler
is a file-like object. You pass a transition
table to compiler
by writing to it and compile the FST corresponding to the
transition table by calling the compile
method:
>>> print('0 1 a b\n1 2 b c\n2 3 c d\n3', file=compiler)
>>> abc2bcd = compiler.compile()
Some shortcuts are often taken when defining FSTs. One is to use “other” as a
label on arcs, meaning that there is a transition with the label x:x for
every x not in the set of outgoing arcs from that state. This relieves the
author of the FST from the tedious and error-prone process of defining these
arcs manually. OpenFST does not support this notation directly, but fststr
provides a function that will take an FST including the symbol <other>
and
mutate it so that the arcs with <other>
are paralleled by the implied arc.
Consider the following example:
>>> st = fststr.symbols_table_from_alphabet(alphabet)
>>> alphabet = ['A', 'a', 'b', 'c', '<other>']
>>> st = fststr.symbols_table_from_alphabet(alphabet)
>>> compiler = fst.Compiler(isymbols=st, osymbols=st, keep_isymbols=True, keep_osymbols=True)
>>> compiler.write('0 1 a A\n0 1 <other> <other>\n1\n')
>>> other = compiler.compile()
>>> print(other.__str__().decode('utf-8'))
0 1 a A
0 1 <other> <other>
1
>>> fststr.expand_other_symbols(other)
>>> print(other.__str__().decode('utf-8'))
0 1 a A
0 1 <other> <other>
0 1 A A
0 1 b b
0 1 c c
1
Note that the arc labeled <other>
will not be deleted, but this does not
matter as long as the input string does not contain the sequence "".
Other, similar wildcard symbols can be defined and used following the example of
<other>
.
Application
Once you have an FST, you can apply it to a string. In reality, this is a four-step process:
- Convert a string to a list of symbols and the list of symbols to a linear-chain automaton
- Compose the FST from 2 with this automaton
- Extract the unique paths through the resulting lattice
- Convert these to strings
FstStr provides functions for doing each of these things and also provides a
single convenience function, apply
that does all of them. This allows the
programmer to simply take a string, apply and FST to it, and get back the
resulting strings.
>>> st = fststr.symbols_table_from_alphabet(['a', 'b', 'c', 'd', '<other>'])
>>> compiler = fst.Compiler(isymbols=st, osymbols=st, keep_isymbols=True, keep_osymbols=True)
>>> compiler.write('0 1 a <epsilon>\n0 1 <other> <other>\n1\n')
>>> del_a = compiler.compile()
>>> fststr.expand_other_symbols(del_a)
>>> fststr.apply('a', del_a)
['']
>>> fststr.apply('b', del_a)
['b']
>>> fststr.apply('c', del_a)
['c']
>>> fststr.apply('d', del_a)
['d']
Example
Examples are in examples/FSTs
. We will examine e-insertion.txt. The FST takes in morphologically separated inputs like
fox<^>s<#>and outputs
foxes<#>`.
Each line of the file represents information about the FST.
The first line 0
represents that q0 is a final state
The second line 0 0 <other> <other>
represents an arc from q0 to q0 with the value <other> : <other>
The fifth line 0 1 z z
represents an arc from q0 to q1 with the value z : z
See e-insertion.py for an example of how to run the FST.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file fststr-0.5.tar.gz
.
File metadata
- Download URL: fststr-0.5.tar.gz
- Upload date:
- Size: 7.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/0.0.0 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1d154095e3a649a5c2850397de690a02aaf807c38d441df7eee4e8d97d4d58b1 |
|
MD5 | 9667fd5a5f40da995c5a29b9c742f667 |
|
BLAKE2b-256 | b87c3ec8eb0e5b562fc79c0a8719bdc223c61c04fead36f1660f40565254290e |
File details
Details for the file fststr-0.5-py3-none-any.whl
.
File metadata
- Download URL: fststr-0.5-py3-none-any.whl
- Upload date:
- Size: 8.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/0.0.0 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fac6ae04178143a131e45fc90a2c37be9cdf9c211de3471866f416abd44ba05a |
|
MD5 | 8c4e2bb4150589408a6d92a66a382be8 |
|
BLAKE2b-256 | 1655aeb05f1a5ba8ee6abfb6d2c05d5bbdc1b95c50218a200bcb6d4768ff4111 |