Skip to main content

Rule-based morphological analysis for Christian Urmi (North-Eastern Neo-Aramaic)

Project description

Urmi morphological analyzer

This is a rule-based morphological analyzer for Christian Urmi (Afro-Asiatic > North-Eastern Neo-Aramaic). It is based on a formalized description of Urmi morphology and uses uniparser-morph for parsing. It performs full morphological analysis of Urmi words (lemmatization, POS tagging, grammatical tagging). The text to be analyzed should be written in the Latin-based alphabet (the Assyrian New Alphabet).

How to use

Python package

The analyzer is available as a Python package. If you want to analyze Urmi texts in Python, install the module:

pip3 install uniparser-urmi

Import the module and create an instance of UrmiAnalyzer class. Set mode='strict' if you are going to process text in standard Assyrian New Alphabet, or mode='nodiacritics' if you expect some words to lack the diacritics (e.g. t instead of ). After that, you can either parse tokens or lists of tokens with analyze_words(), or parse a frequency list with analyze_wordlist(). Here is a simple example:

from uniparser_urmi import UrmiAnalyzer
a = UrmiAnalyzer(mode='strict')

analyses = a.analyze_words('вajjannux')
# The parser is initialized before first use, so expect
# some delay here (usually several seconds)

# You will get a list of Wordform objects
# The analysis attributes are stored in its properties
# as string values, e.g.:
for ana in analyses:
        print(ana.wf, ana.lemma, ana.gramm)

# You can also pass lists (even nested lists) and specify
# output format ('xml', 'json' or 'conll')
# If you pass a list, you will get a list of analyses
# with the same structure
analyses = a.analyze_words([['вajjannux'], ['ʕəbarwo', 'lab', 'bote', '.']],
	                       format='xml')
analyses = a.analyze_words([['вajjannux'], ['ʕəbarwo', 'lab', 'bote', '.']],
	                       format='conll')
analyses = a.analyze_words(['вajjannux', [['laḥmawo'], ['ʕəbarwo', 'lab', 'bote', '.']]],
	                       format='json')

Refer to the uniparser-morph documentation for the full list of options.

If you want to quickly check an analysis for one particular word, you can also use the command-line interface. Here is an example for the word вajjannux:

python3 -m uniparser_urmi вajjannux

Word lists

Alternatively, you can use a preprocessed word list. The wordlists directory contains a list of words from a 622-thousand-word Christian Urmi corpus (wordlist.csv) with 63,000 unique tokens, list of analyzed tokens (wordlist_analyzed.txt; each line contains all possible analyses for one word in an XML format), and list of tokens the parser could not analyze (wordlist_unanalyzed.txt). The recall of the analyzer on the corpus texts is about 76%.

Description format

The description is carried out in the uniparser-morph format and involves a description of the inflection (paradigms.txt) and a grammatical dictionary (lexemes.txt). The dictionary contains descriptions of individual lexemes, each of which is accompanied by information about its stem, its part-of-speech tag and some other grammatical information, its consonant root, its inflectional type (paradigm), and English and/or Russian translations. See more about the format in the uniparser-morph documentation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

uniparser-urmi-1.1.0.tar.gz (260.8 kB view details)

Uploaded Source

Built Distribution

uniparser_urmi-1.1.0-py3-none-any.whl (263.9 kB view details)

Uploaded Python 3

File details

Details for the file uniparser-urmi-1.1.0.tar.gz.

File metadata

  • Download URL: uniparser-urmi-1.1.0.tar.gz
  • Upload date:
  • Size: 260.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.7.4

File hashes

Hashes for uniparser-urmi-1.1.0.tar.gz
Algorithm Hash digest
SHA256 08bb030c689d4e90f3ce7d72ef3ee7c94edc37ab27f68626ca9f8ef7d8440dec
MD5 a71ea390ec59aa7c3be04b80f63e0de7
BLAKE2b-256 fa292b4c14c72acfc3759cbc8d8d75bc4faae0683e9a5b04e2db978aec29f63c

See more details on using hashes here.

File details

Details for the file uniparser_urmi-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: uniparser_urmi-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 263.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.7.4

File hashes

Hashes for uniparser_urmi-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bf25f755a4dcd6c7e05769c79bb178e8c967db1007ad0bba0146953d71878856
MD5 c83fb64ef90d2ca55394c4af44fc96ca
BLAKE2b-256 2f8c7ef3ff14dd12ee239b6cb037183d9453353ae8d94b2e47a2b949a63a8f99

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page