Skip to main content
Join the official 2019 Python Developers SurveyStart the survey!

Annotated morphology in the world's languages

Project description

UniMorph: The Universal Morphology Initiative

PyPI version Supported Python versions

The Universal Morphology (UniMorph) project is a collaborative effort to improve how NLP handles complex morphology in the world’s languages. The goal of UniMorph is to annotate morphological data in a universal schema that allows an inflected word from any language to be defined by its lexical meaning, typically carried by the lemma, and by a rendering of its inflectional form in terms of a bundle of morphological features from our schema. The specification of the schema is described in Sylak-Glassman (2016).


This tool provides turnkey command-line access to morphological annotations in over 100 languages.

To install the UniMorph Python extension, install it from PyPI:

pip3 install unimorph

The tool will then be available to you from the command-line as unimorph. To see the features available, run unimorph --help.

Usage

Query the available UniMorph languages' ISO 639-3 codes.

unimorph list

Give the complete paradigm for a lemma.

unimorph inflect --word recken --lang deu

Get a particular form of the lemma.

unimorph inflect --word recken --features V;IND;PRS;2;SG --lang deu

Analyze a word form: What are its lemma and features?

unimorph analyze --word gereckt --lang deu

(You can also use short param names.)

unimorph analyze -w gereckt -l deu

Records in UniMorph's inflectional databases cannot hope to exhaustively cover a language's lexicon, especially in light of novel words. If a word is missing, let us know.

Contribution

UniMorph is an open project! We want you!

Found a bug? Want to contribute source code? Submit an issue or pull request to the appropriate GitHub repository. Language-specific corrections or additions should be marked in their corresponding repository; improvements to the unimorph command-line tool should be noted in the unimorph repository.

Citation

If you use the latest version of the UniMorph datasets (v2.0), please cite Kirov et al. (2018):

@inproceedings{kirov-etal-2018-unimorph,
    title = "{U}ni{M}orph 2.0: Universal Morphology",
    author = {Kirov, Christo  and
      Cotterell, Ryan  and
      Sylak-Glassman, John  and
      Walther, G{\'e}raldine  and
      Vylomova, Ekaterina  and
      Xia, Patrick  and
      Faruqui, Manaal  and
      Mielke, Sebastian  and
      McCarthy, Arya  and
      K{\"u}bler, Sandra  and
      Yarowsky, David  and
      Eisner, Jason  and
      Hulden, Mans},
    booktitle = "Proceedings of the Eleventh International Conference on Language Resources and Evaluation ({LREC} 2018)",
    month = may,
    year = "2018",
    address = "Miyazaki, Japan",
    publisher = "European Language Resources Association (ELRA)",
    url = "https://www.aclweb.org/anthology/L18-1293",
}

If you refer to the latest version of the universal annotation schema, please cite Sylak-Glassman et al. (2015):

@inproceedings{sylak-glassman-etal-2015-language,
    title = "A Language-Independent Feature Schema for Inflectional Morphology",
    author = "Sylak-Glassman, John  and
      Kirov, Christo  and
      Yarowsky, David  and
      Que, Roger",
    booktitle = "Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)",
    month = jul,
    year = "2015",
    address = "Beijing, China",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/P15-2111",
    doi = "10.3115/v1/P15-2111",
    pages = "674--680",
}

Advanced usage

unimorph stores language databases in a default location. This can be overridden by setting the shell environment variable UNIMORPH to the preferred folder.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for unimorph, version 0.0.3
Filename, size File type Python version Upload date Hashes
Filename, size unimorph-0.0.3-py3-none-any.whl (5.6 kB) File type Wheel Python version py3 Upload date Hashes View hashes
Filename, size unimorph-0.0.3.tar.gz (5.5 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page