Skip to main content

Rule-based morphological analysis for Ossetic (Iron)

Project description

Ossetic (Iron) morphological analyzer

This is a rule-based morphological analyzer for Ossetic (oss). It is based on a formalized description of the morphology of literary Ossetic, which is based on the Iron dialect, and uses uniparser-morph for parsing. It performs full morphological analysis of Ossetic words (lemmatization, POS tagging, grammatical tagging).

How to use

Python package

The analyzer is available as a Python package. If you want to analyze Ossetic texts in Python, install the module:

pip3 install uniparser-ossetic

Import the module and create an instance of OsseticAnalyzer class. Set mode='strict' if you are going to process text in standard orthography, or mode='nodiacritics' if you expect the ӕ character to be misrepresented in some words (either as an identically looking Latin character or as ае). After that, you can either parse tokens or lists of tokens with analyze_words(), or parse a frequency list with analyze_wordlist(). Here is a simple example:

from uniparser_ossetic import OsseticAnalyzer
a = OsseticAnalyzer(mode='strict')

analyses = a.analyze_words('ӕвзаджы')
# The parser is initialized before first use, so expect
# some delay here (usually several seconds)

# You will get a list of Wordform objects
# The analysis attributes are stored in its properties
# as string values, e.g.:
for ana in analyses:
        print(ana.wf, ana.lemma, ana.gramm)

# You can also pass lists (even nested lists) and specify
# output format ('xml', 'json' or 'conll')
# If you pass a list, you will get a list of analyses
# with the same structure
analyses = a.analyze_words([['Фӕлӕ'], ['Æз', 'дæ', 'уарзын', '.']],
	                       format='xml')
analyses = a.analyze_words([['Фӕлӕ'], ['Æз', 'дæ', 'уарзын', '.']],
	                       format='conll')
analyses = a.analyze_words(['ӕвзаджы', [['Фӕлӕ'], ['Æз', 'дæ', 'уарзын', '.']]],
	                       format='json')

Refer to the uniparser-morph documentation for the full list of options.

Word lists

Alternatively, you can use a preprocessed word list. The wordlists directory contains a list of words from a 12-million-word Ossetic National Corpus (wordlist.csv) with 438,000 unique tokens, list of analyzed tokens (wordlist_analyzed.txt; each line contains all possible analyses for one word in an XML format), and list of tokens the parser could not analyze (wordlist_unanalyzed.txt). The recall of the analyzer on the corpus texts is about 90%.

Description format

The description is carried out in the uniparser-morph format and involves a description of the inflection (oss_paradigms.txt), a grammatical dictionary (oss_lexemes.txt), a list of productive lemma-changing derivations (derivations.txt), and a short list of analyses that should be avoided (bad_analyses.txt). The dictionary contains descriptions of individual lexemes, each of which is accompanied by information about its stem, its part-of-speech tag and some other grammatical information, its inflectional type (paradigm), and Russian and/or English translation. See more about the format in the uniparser-morph documentation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

uniparser-ossetic-2.0.0.tar.gz (1.3 MB view details)

Uploaded Source

Built Distribution

uniparser_ossetic-2.0.0-py3-none-any.whl (1.4 MB view details)

Uploaded Python 3

File details

Details for the file uniparser-ossetic-2.0.0.tar.gz.

File metadata

  • Download URL: uniparser-ossetic-2.0.0.tar.gz
  • Upload date:
  • Size: 1.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.28.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for uniparser-ossetic-2.0.0.tar.gz
Algorithm Hash digest
SHA256 ec19fdd2421f2215c0458ae99eeab38fd77bdac639e13a3ea5573fceaed2aeb6
MD5 ec7c6c81b2eed22057b2c929d3d20190
BLAKE2b-256 d0d1153ef4080e57ebf8f66eaf5992beb481fba1996eb389e2d26cba971d00f9

See more details on using hashes here.

File details

Details for the file uniparser_ossetic-2.0.0-py3-none-any.whl.

File metadata

  • Download URL: uniparser_ossetic-2.0.0-py3-none-any.whl
  • Upload date:
  • Size: 1.4 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.28.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for uniparser_ossetic-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b814f5f9ec100764d805c8b20dfb77b8fb723c3463ac3cdc2abb0c9bb026012d
MD5 1748e9a2d6c82e5063124ca34f061806
BLAKE2b-256 6c9ce2b8d829b7523fb2861303a96cc45c485c29ec9e43a33b2e2e12d10be64a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page