Skip to main content

Rule-based morphological analysis for Turoyo

Project description

Turoyo morphological analyzer

This is a rule-based morphological analyzer for Ṭuroyo (tru, Afro-Asiatic > Central Neo-Aramaic). It is based on a formalized description of Turoyo morphology and uses uniparser-morph for parsing. It performs full morphological analysis of Turoyo words (lemmatization, POS tagging, grammatical tagging). The text to be analyzed should be written in a version of Latin Turoyo alphabet which is somewhat closer to IPA: it uses ʔ instead of ', ʕ instead of c, ə insteadt of ë etc.

How to use

Python package

The analyzer is available as a Python package. If you want to analyze Turoyo texts in Python, install the module:

pip3 install uniparser-turoyo

Import the module and create an instance of TuroyoAnalyzer class. Set mode='strict' if you are going to process text in standard Latin Turoyo alphabet, or mode='nodiacritics' if you expect some words to lack the diacritics (e.g. t instead of ). After that, you can either parse tokens or lists of tokens with analyze_words(), or parse a frequency list with analyze_wordlist(). Here is a simple example:

from uniparser_turoyo import TuroyoAnalyzer
a = TuroyoAnalyzer(mode='strict')

analyses = a.analyze_words('koroḥamnux')
# The parser is initialized before first use, so expect
# some delay here (usually several seconds)

# You will get a list of Wordform objects
# The analysis attributes are stored in its properties
# as string values, e.g.:
for ana in analyses:
        print(ana.wf, ana.lemma, ana.gramm)

# You can also pass lists (even nested lists) and specify
# output format ('xml', 'json' or 'conll')
# If you pass a list, you will get a list of analyses
# with the same structure
analyses = a.analyze_words([['koroḥamnux'], ['ʕəbarwo', 'lab', 'bote', '.']],
	                       format='xml')
analyses = a.analyze_words([['koroḥamnux'], ['ʕəbarwo', 'lab', 'bote', '.']],
	                       format='conll')
analyses = a.analyze_words(['koroḥamnux', [['laḥmawo'], ['ʕəbarwo', 'lab', 'bote', '.']]],
	                       format='json')

Refer to the uniparser-morph documentation for the full list of options.

If you want to quickly check an analysis for one particular word, you can also use the command-line interface. Here is an example for the word koroḥamnux:

python3 -m uniparser_turoyo koroḥamnux

Word lists

Alternatively, you can use a preprocessed word list. The wordlists directory contains a list of words from a 600-thousand-word Ṭuroyo corpus (wordlist.csv) with 53,000 unique tokens, list of analyzed tokens (wordlist_analyzed.txt; each line contains all possible analyses for one word in an XML format), and list of tokens the parser could not analyze (wordlist_unanalyzed.txt). The recall of the analyzer on the corpus texts is about 90%. (This number is somewhat low due to orthographic variability in the texts.)

Description format

The description is carried out in the uniparser-morph format and involves a description of the inflection (paradigms/paradigms_XXX.txt) and a grammatical dictionary (lexemes/lexemes-XXX.txt files). The dictionary contains descriptions of individual lexemes, each of which is accompanied by information about its stem, its part-of-speech tag and some other grammatical information, its consonant root, its inflectional type (paradigm), and English and/or German translations. See more about the format in the uniparser-morph documentation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

uniparser-turoyo-1.1.2.tar.gz (1.2 MB view details)

Uploaded Source

Built Distribution

uniparser_turoyo-1.1.2-py3-none-any.whl (1.2 MB view details)

Uploaded Python 3

File details

Details for the file uniparser-turoyo-1.1.2.tar.gz.

File metadata

  • Download URL: uniparser-turoyo-1.1.2.tar.gz
  • Upload date:
  • Size: 1.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.28.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for uniparser-turoyo-1.1.2.tar.gz
Algorithm Hash digest
SHA256 d3cfd6f83f21b722c11a4c4b1c014aa8075340d628dfa5e83623fbea30ffb900
MD5 34d8e39cec19591af87c7642f2b79f1e
BLAKE2b-256 1c8420245007748f317aef3f1950a5d33d46d666de69178b15df53aac56cb39a

See more details on using hashes here.

File details

Details for the file uniparser_turoyo-1.1.2-py3-none-any.whl.

File metadata

  • Download URL: uniparser_turoyo-1.1.2-py3-none-any.whl
  • Upload date:
  • Size: 1.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.28.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for uniparser_turoyo-1.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 eb8d55b6e2f20b71c316ccc9555849a9032c4aaeaa46a24227a9c34fec174755
MD5 7eef7bab677ebce0a65b2385bc40f63f
BLAKE2b-256 9ae67c46dfdf7fc1c3ae26f37a2653e2d482034772d708d9c788588d5dab9faf

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page