Skip to main content

Collection of utilities for parsing natural languages using context-free grammars

Project description

Suomilog is a toolkit for parsing context-free grammars that have embedded morphological information. It’s intended use is parsing Finnish sentences and it includes a module that can parse and generate inflected Finnish sentences using libvoikko as back-end.

Context-free grammars

The suomilog module is used for parsing context-free grammars. Suomilog grammars are context-free grammars that have additional morphological information. Below is an example.

.FEATURE ::= .HUMAN{+gen} nimi{$}
.FEATURE ::= .HUMAN{+gen} ikä{$}

.HUMAN ::= mies{$}
.HUMAN ::= nainen{$}
.HUMAN ::= ihminen{$}
.HUMAN ::= .ADJECTIVE{$} .HUMAN{$}

.ADJECTIVE ::= ahkera{$}
.ADJECTIVE ::= kaunis{$}
.ADJECTIVE ::= erittäin .ADJECTIVE{$}

In Suomilog grammars nonterminals are marked with a dot and all other symbols are terminals.

Morphological information is written in braces after a symbol. For example kaunis{+gen} would match the word kauniin. A dollar $ means that the morphological information is passed forward. So .HUMAN{+gen} for example mathes miehen, naisen, ihmisen, ahkeran miehen, and so on.

The braces can contain multiple form names separated with commas. For example, tehdä{+inf3,+gen} would match tekemisen. In code these form names are called “bits”.

For example usage, see the examples/ folder.

Finnish morphology

The suomilog.finnish module contains tools for Finnish morphological parsing and generation. It uses pypykko as its back-end.

The function suomilog.finnish.tokenize(text) is used to tokenize words:

import suomilog.finnish as f
f.tokenize("kissa käveli kadulla")
# outputs:
[
    Token('kissa', [('kissa', {'', '+sg+nom', ':noun', '«kissa»', 'kissa:noun', 'kissa:', '+sg', '+nom'})]),
    Token('käveli', [('kävellä', {'', '+3sg', '«käveli»', 'kävellä:', ':verb', 'kävellä:verb', '+past'})]),
    Token('kadulla', [('katu', {'', ':noun', 'katu:', '«kadulla»', '+ade', '+sg', 'katu:noun'})])
]

The function suomilog.finnish.inflect_nominal(word, plural, case) is used to inflect nouns, adjectives and numerals:

import suomilog.finnish as f
print(f.inflect_nominal("kissa", "+pl", "+par")) # kissoja

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

suomilog-1.1.tar.gz (9.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

suomilog-1.1-py3-none-any.whl (12.6 kB view details)

Uploaded Python 3

File details

Details for the file suomilog-1.1.tar.gz.

File metadata

  • Download URL: suomilog-1.1.tar.gz
  • Upload date:
  • Size: 9.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Fedora Linux Asahi Remix","version":"43","id":"","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for suomilog-1.1.tar.gz
Algorithm Hash digest
SHA256 530f534962e2a19663b76f555ca623915fc4f11a7e53f8248169209e422fa0d8
MD5 5e1f1df5c3158646c3c8697ba208840f
BLAKE2b-256 548afbef9351348bc1dd1059c47a5520869c5a882a3ab32f21e899ad01dbd551

See more details on using hashes here.

File details

Details for the file suomilog-1.1-py3-none-any.whl.

File metadata

  • Download URL: suomilog-1.1-py3-none-any.whl
  • Upload date:
  • Size: 12.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Fedora Linux Asahi Remix","version":"43","id":"","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for suomilog-1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 afbab7c6c43f97251fd2a61005883b87abb99e9847c7de768e5be2621236a64a
MD5 fa3371ac635b91d1e3a725205eb1f34f
BLAKE2b-256 673b6bcd06cfc098d22f61316c6c689bd43768c1c8bd89f62c360f16924bed0c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page