Collection of utilities for parsing natural languages using context-free grammars
Project description
Suomilog is a toolkit for parsing context-free grammars that have embedded morphological information. It’s intended use is parsing Finnish sentences and it includes a module that can parse and generate inflected Finnish sentences using libvoikko as back-end.
Context-free grammars
The suomilog module is used for parsing context-free grammars. Suomilog grammars are context-free grammars that have additional morphological information. Below is an example.
.FEATURE ::= .HUMAN{+gen} nimi{$}
.FEATURE ::= .HUMAN{+gen} ikä{$}
.HUMAN ::= mies{$}
.HUMAN ::= nainen{$}
.HUMAN ::= ihminen{$}
.HUMAN ::= .ADJECTIVE{$} .HUMAN{$}
.ADJECTIVE ::= ahkera{$}
.ADJECTIVE ::= kaunis{$}
.ADJECTIVE ::= erittäin .ADJECTIVE{$}
In Suomilog grammars nonterminals are marked with a dot and all other symbols are terminals.
Morphological information is written in braces after a symbol. For example kaunis{+gen} would match the word kauniin. A dollar $ means that the morphological information is passed forward. So .HUMAN{+gen} for example mathes miehen, naisen, ihmisen, ahkeran miehen, and so on.
The braces can contain multiple form names separated with commas. For example, tehdä{+inf3,+gen} would match tekemisen. In code these form names are called “bits”.
For example usage, see the examples/ folder.
Finnish morphology
The suomilog.finnish module contains tools for Finnish morphological parsing and generation. It uses pypykko as its back-end.
The function suomilog.finnish.tokenize(text) is used to tokenize words:
import suomilog.finnish as f
f.tokenize("kissa käveli kadulla")
# outputs:
[
Token('kissa', [('kissa', {'', '+sg+nom', ':noun', '«kissa»', 'kissa:noun', 'kissa:', '+sg', '+nom'})]),
Token('käveli', [('kävellä', {'', '+3sg', '«käveli»', 'kävellä:', ':verb', 'kävellä:verb', '+past'})]),
Token('kadulla', [('katu', {'', ':noun', 'katu:', '«kadulla»', '+ade', '+sg', 'katu:noun'})])
]
The function suomilog.finnish.inflect_nominal(word, plural, case) is used to inflect nouns, adjectives and numerals:
import suomilog.finnish as f
print(f.inflect_nominal("kissa", "+pl", "+par")) # kissoja
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file suomilog-1.0.tar.gz.
File metadata
- Download URL: suomilog-1.0.tar.gz
- Upload date:
- Size: 9.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Fedora Linux Asahi Remix","version":"43","id":"","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f12a1de634efa5d2765c2837d462b4a4f8a8927c4844f2058ab1847c88409fdc
|
|
| MD5 |
d1edc7df7390c63133e005aa1a36cb29
|
|
| BLAKE2b-256 |
f031c972b1068170565dc3d72e601f5094a8c4eac56c2f694d9df78a65d7a8d7
|
File details
Details for the file suomilog-1.0-py3-none-any.whl.
File metadata
- Download URL: suomilog-1.0-py3-none-any.whl
- Upload date:
- Size: 11.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Fedora Linux Asahi Remix","version":"43","id":"","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
203171c0446abfc9f7173f7e68a554d50ced8ad8f0eba87605d03c264db93633
|
|
| MD5 |
6e42df32e9018d673d4423d0527c7569
|
|
| BLAKE2b-256 |
bbfb6d4c92c03961b82ea11b11645cab23bb05217a229ae96018fda019c69ff2
|