Skip to main content

Rule-based morphological analysis for Buryat

Project description

Buryat morphological analyzer

This is a rule-based morphological analyzer for Buryat (bua; Mongolic). It is based on a formalized description of literary Buryat morphology, which uses uniparser-morph for parsing. It performs full morphological analysis of Buryat words (lemmatization, POS tagging, grammatical tagging, glossing).

NB: The analyzer is still under construction. Right now, a number of entries in the dictionary have wrong POS tags or paradigms. Use with caution.

How to use

Python package

The analyzer is available as a Python package. If you want to analyze Buryat texts in Python, install the module:

pip3 install uniparser-buryat

Import the module and create an instance of BuryatAnalyzer class. Set mode='strict' if you are going to process text in standard orthography, or mode='nodiacritics' if you expect some words to lack the diacritics (which often happens in social media). After that, you can either parse tokens or lists of tokens with analyze_words(), or parse a frequency list with analyze_wordlist(). Here is a simple example:

from uniparser_buryat import BuryatAnalyzer
a = BuryatAnalyzer(mode='strict')

analyses = a.analyze_words('Морфологи')
# The parser is initialized before first use, so expect
# some delay here (usually several seconds)

# You will get a list of Wordform objects
# The analysis attributes are stored in its properties
# as string values, e.g.:
for ana in analyses:
        print(ana.wf, ana.lemma, ana.gramm, ana.gloss)

# You can also pass lists (even nested lists) and specify
# output format ('xml' or 'json')
# If you pass a list, you will get a list of analyses
# with the same structure
analyses = a.analyze_words([['А'], ['Би', 'шамдаа', 'дуратайб', '.']],
	                       format='xml')
analyses = a.analyze_words(['Морфологи', [['А'], ['Би', 'шамдаа', 'дуратайб', '.']]],
	                       format='json')

Refer to the uniparser-morph documentation for the full list of options.

Word lists

Description format

The description is carried out in the uniparser-morph format and involves a description of the inflection (paradigms.txt), a grammatical dictionary (bua_lexemes_XXX.txt files), and a short list of analyses that should be avoided (bad_analyses.txt). The dictionary contains descriptions of individual lexemes, each of which is accompanied by information about its stem, its part-of-speech tag and some other grammatical/borrowing information, its inflectional type (paradigm), and, for some, Russian translation. See more about the format in the uniparser-morph documentation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

uniparser_buryat-1.1.2.tar.gz (4.6 kB view details)

Uploaded Source

Built Distribution

uniparser_buryat-1.1.2-py3-none-any.whl (5.2 kB view details)

Uploaded Python 3

File details

Details for the file uniparser_buryat-1.1.2.tar.gz.

File metadata

  • Download URL: uniparser_buryat-1.1.2.tar.gz
  • Upload date:
  • Size: 4.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.28.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for uniparser_buryat-1.1.2.tar.gz
Algorithm Hash digest
SHA256 97d2e831dc8ead57176ff7dd8a18160f9661c451ef9251b5b4ee228c2a2d09b5
MD5 04a5a49b574db4671469e8303f2e08d1
BLAKE2b-256 101328ac0812bf9e864e60d2553b0ad717ae2702cba6334302e8698d70ba220e

See more details on using hashes here.

File details

Details for the file uniparser_buryat-1.1.2-py3-none-any.whl.

File metadata

  • Download URL: uniparser_buryat-1.1.2-py3-none-any.whl
  • Upload date:
  • Size: 5.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.28.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for uniparser_buryat-1.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 bfb033a99147cb7bb957a5d6b26d8d90f4358dc98b27d5f97e3b9d707b191d88
MD5 f21008eb2527b00412d9c8652ec9636c
BLAKE2b-256 348cef988a2efe873d13242c55266f319f48c4b949ae9bac9e935bcbf97a4484

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page