Skip to main content

Collection of utilities for parsing natural languages using context-free grammars

Project description

Suomilog is a toolkit for parsing context-free grammars that have embedded morphological information. It’s intended use is parsing Finnish sentences and it includes a module that can parse and generate inflected Finnish sentences using libvoikko as back-end.

Context-free grammars

The suomilog module is used for parsing context-free grammars. Suomilog grammars are context-free grammars that have additional morphological information. Below is an example.

.FEATURE ::= .HUMAN{+gen} nimi{$}
.FEATURE ::= .HUMAN{+gen} ikä{$}

.HUMAN ::= mies{$}
.HUMAN ::= nainen{$}
.HUMAN ::= ihminen{$}
.HUMAN ::= .ADJECTIVE{$} .HUMAN{$}

.ADJECTIVE ::= ahkera{$}
.ADJECTIVE ::= kaunis{$}
.ADJECTIVE ::= erittäin .ADJECTIVE{$}

In Suomilog grammars nonterminals are marked with a dot and all other symbols are terminals.

Morphological information is written in braces after a symbol. For example kaunis{+gen} would match the word kauniin. A dollar $ means that the morphological information is passed forward. So .HUMAN{+gen} for example mathes miehen, naisen, ihmisen, ahkeran miehen, and so on.

The braces can contain multiple form names separated with commas. For example, tehdä{+inf3,+gen} would match tekemisen. In code these form names are called “bits”.

For example usage, see the examples/ folder.

Finnish morphology

The suomilog.finnish module contains tools for Finnish morphological parsing and generation. It uses pypykko as its back-end.

The function suomilog.finnish.tokenize(text) is used to tokenize words:

import suomilog.finnish as f
f.tokenize("kissa käveli kadulla")
# outputs:
[
    Token('kissa', [('kissa', {'', '+sg+nom', ':noun', '«kissa»', 'kissa:noun', 'kissa:', '+sg', '+nom'})]),
    Token('käveli', [('kävellä', {'', '+3sg', '«käveli»', 'kävellä:', ':verb', 'kävellä:verb', '+past'})]),
    Token('kadulla', [('katu', {'', ':noun', 'katu:', '«kadulla»', '+ade', '+sg', 'katu:noun'})])
]

The function suomilog.finnish.inflect_nominal(word, plural, case) is used to inflect nouns, adjectives and numerals:

import suomilog.finnish as f
print(f.inflect_nominal("kissa", "+pl", "+par")) # kissoja

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

suomilog-1.0.tar.gz (9.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

suomilog-1.0-py3-none-any.whl (11.9 kB view details)

Uploaded Python 3

File details

Details for the file suomilog-1.0.tar.gz.

File metadata

  • Download URL: suomilog-1.0.tar.gz
  • Upload date:
  • Size: 9.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Fedora Linux Asahi Remix","version":"43","id":"","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for suomilog-1.0.tar.gz
Algorithm Hash digest
SHA256 f12a1de634efa5d2765c2837d462b4a4f8a8927c4844f2058ab1847c88409fdc
MD5 d1edc7df7390c63133e005aa1a36cb29
BLAKE2b-256 f031c972b1068170565dc3d72e601f5094a8c4eac56c2f694d9df78a65d7a8d7

See more details on using hashes here.

File details

Details for the file suomilog-1.0-py3-none-any.whl.

File metadata

  • Download URL: suomilog-1.0-py3-none-any.whl
  • Upload date:
  • Size: 11.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Fedora Linux Asahi Remix","version":"43","id":"","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for suomilog-1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 203171c0446abfc9f7173f7e68a554d50ced8ad8f0eba87605d03c264db93633
MD5 6e42df32e9018d673d4423d0527c7569
BLAKE2b-256 bbfb6d4c92c03961b82ea11b11645cab23bb05217a229ae96018fda019c69ff2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page