Skip to main content

Annotated morphology in the world's languages

Project description

UniMorph: The Universal Morphology Initiative

PyPI version Supported Python versions

The Universal Morphology (UniMorph) project is a collaborative effort to improve how NLP handles complex morphology in the world’s languages. The goal of UniMorph is to annotate morphological data in a universal schema that allows an inflected word from any language to be defined by its lexical meaning, typically carried by the lemma, and by a rendering of its inflectional form in terms of a bundle of morphological features from our schema. The specification of the schema is described in Sylak-Glassman (2016).


This tool provides turnkey command-line access to morphological annotations in over 100 languages.

To install the UniMorph Python extension, install it from PyPI:

pip3 install unimorph

The tool will then be available to you from the command-line as unimorph. To see the features available, run unimorph --help.

Usage

Query the available UniMorph languages' ISO 639-3 codes.

unimorph list

Give the complete paradigm for a lemma.

unimorph inflect --word recken --lang deu

Get a particular form of the lemma.

unimorph inflect --word recken --features V;IND;PRS;2;SG --lang deu

Analyze a word form: What are its lemma and features?

unimorph analyze --word gereckt --lang deu

(You can also use short param names.)

unimorph analyze -w gereckt -l deu

Records in UniMorph's inflectional databases cannot hope to exhaustively cover a language's lexicon, especially in light of novel words. If a word is missing, let us know.

Contribution

UniMorph is an open project! We want you!

Found a bug? Want to contribute source code? Submit an issue or pull request to the appropriate GitHub repository. Language-specific corrections or additions should be marked in their corresponding repository; improvements to the unimorph command-line tool should be noted in the unimorph repository.

Citation

If you use the latest version of the UniMorph datasets (v2.0), please cite Kirov et al. (2018):

@inproceedings{kirov-etal-2018-unimorph,
    title = "{U}ni{M}orph 2.0: Universal Morphology",
    author = {Kirov, Christo  and
      Cotterell, Ryan  and
      Sylak-Glassman, John  and
      Walther, G{\'e}raldine  and
      Vylomova, Ekaterina  and
      Xia, Patrick  and
      Faruqui, Manaal  and
      Mielke, Sebastian  and
      McCarthy, Arya  and
      K{\"u}bler, Sandra  and
      Yarowsky, David  and
      Eisner, Jason  and
      Hulden, Mans},
    booktitle = "Proceedings of the Eleventh International Conference on Language Resources and Evaluation ({LREC} 2018)",
    month = may,
    year = "2018",
    address = "Miyazaki, Japan",
    publisher = "European Language Resources Association (ELRA)",
    url = "https://www.aclweb.org/anthology/L18-1293",
}

If you refer to the latest version of the universal annotation schema, please cite Sylak-Glassman et al. (2015):

@inproceedings{sylak-glassman-etal-2015-language,
    title = "A Language-Independent Feature Schema for Inflectional Morphology",
    author = "Sylak-Glassman, John  and
      Kirov, Christo  and
      Yarowsky, David  and
      Que, Roger",
    booktitle = "Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)",
    month = jul,
    year = "2015",
    address = "Beijing, China",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/P15-2111",
    doi = "10.3115/v1/P15-2111",
    pages = "674--680",
}

Advanced usage

unimorph stores language databases in a default location. This can be overridden by setting the shell environment variable UNIMORPH to the preferred folder.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unimorph-0.0.3.tar.gz (5.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

unimorph-0.0.3-py3-none-any.whl (5.6 kB view details)

Uploaded Python 3

File details

Details for the file unimorph-0.0.3.tar.gz.

File metadata

  • Download URL: unimorph-0.0.3.tar.gz
  • Upload date:
  • Size: 5.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.4.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for unimorph-0.0.3.tar.gz
Algorithm Hash digest
SHA256 f58f4cd5c87cca5e269619e6774c1b7d78024c1b3bb00f1560dc82fe2bef91fb
MD5 ac97c13b97dbd03353bf49031c48db73
BLAKE2b-256 3719d0623f41f9b79a22e5f00b3d24def1168c4f628ded750ed0f0065bcbe0c1

See more details on using hashes here.

File details

Details for the file unimorph-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: unimorph-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 5.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.4.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for unimorph-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 518feed43155869343d43b3f2e9854d70a8f07cd2f58fc215c387cf1fa572d80
MD5 761e965c0f4b308f0abec34991602725
BLAKE2b-256 afa4a931511433a9ff3062d8a2221c3838d60b67689a8efcb634b9fb5ccfb44e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page