Skip to main content

Annotated morphology in the world's languages

Project description

UniMorph: The Universal Morphology Initiative

PyPI version Supported Python versions

The Universal Morphology (UniMorph) project is a collaborative effort to improve how NLP handles complex morphology in the world’s languages. The goal of UniMorph is to annotate morphological data in a universal schema that allows an inflected word from any language to be defined by its lexical meaning, typically carried by the lemma, and by a rendering of its inflectional form in terms of a bundle of morphological features from our schema. The specification of the schema is described in Sylak-Glassman (2016).


This tool provides turnkey command-line access to morphological annotations in over 100 languages.

To install the UniMorph Python extension, install it from PyPI:

pip3 install unimorph

The tool will then be available to you from the command-line as unimorph. To see the features available, run unimorph --help.

Usage

Query the available UniMorph languages' ISO 639-3 codes.

unimorph list

Give the complete paradigm for a lemma.

unimorph inflect --word recken --lang deu

Get a particular form of the lemma.

unimorph inflect --word recken --features V;IND;PRS;2;SG --lang deu

Analyze a word form: What are its lemma and features?

unimorph analyze --word gereckt --lang deu

(You can also use short param names.)

unimorph analyze -w gereckt -l deu

Records in UniMorph's inflectional databases cannot hope to exhaustively cover a language's lexicon, especially in light of novel words. If a word is missing, let us know.

Contribution

UniMorph is an open project! We want you!

Found a bug? Want to contribute source code? Submit an issue or pull request to the appropriate GitHub repository. Language-specific corrections or additions should be marked in their corresponding repository; improvements to the unimorph command-line tool should be noted in the unimorph repository.

Citation

If you use the latest version of the UniMorph datasets (v2.0), please cite Kirov et al. (2018):

@inproceedings{kirov-etal-2018-unimorph,
    title = "{U}ni{M}orph 2.0: Universal Morphology",
    author = {Kirov, Christo  and
      Cotterell, Ryan  and
      Sylak-Glassman, John  and
      Walther, G{\'e}raldine  and
      Vylomova, Ekaterina  and
      Xia, Patrick  and
      Faruqui, Manaal  and
      Mielke, Sebastian  and
      McCarthy, Arya  and
      K{\"u}bler, Sandra  and
      Yarowsky, David  and
      Eisner, Jason  and
      Hulden, Mans},
    booktitle = "Proceedings of the Eleventh International Conference on Language Resources and Evaluation ({LREC} 2018)",
    month = may,
    year = "2018",
    address = "Miyazaki, Japan",
    publisher = "European Language Resources Association (ELRA)",
    url = "https://www.aclweb.org/anthology/L18-1293",
}

If you refer to the latest version of the universal annotation schema, please cite Sylak-Glassman et al. (2015):

@inproceedings{sylak-glassman-etal-2015-language,
    title = "A Language-Independent Feature Schema for Inflectional Morphology",
    author = "Sylak-Glassman, John  and
      Kirov, Christo  and
      Yarowsky, David  and
      Que, Roger",
    booktitle = "Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)",
    month = jul,
    year = "2015",
    address = "Beijing, China",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/P15-2111",
    doi = "10.3115/v1/P15-2111",
    pages = "674--680",
}

Advanced usage

unimorph stores language databases in a default location. This can be overridden by setting the shell environment variable UNIMORPH to the preferred folder.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unimorph-0.0.4.tar.gz (5.7 kB view details)

Uploaded Source

Built Distribution

unimorph-0.0.4-py3-none-any.whl (5.7 kB view details)

Uploaded Python 3

File details

Details for the file unimorph-0.0.4.tar.gz.

File metadata

  • Download URL: unimorph-0.0.4.tar.gz
  • Upload date:
  • Size: 5.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.6.0.post20191030 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.5

File hashes

Hashes for unimorph-0.0.4.tar.gz
Algorithm Hash digest
SHA256 307a80b99017febc782fab965cb219c4814472fa298850d54c73d4ed55c5dd57
MD5 8af42303321bf7992a8ac27ec3a11b8d
BLAKE2b-256 9f2f9e48c2635a86f7d2c29ec0d05dedde0fd2935f5a29264cf4235906f9a010

See more details on using hashes here.

File details

Details for the file unimorph-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: unimorph-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 5.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.6.0.post20191030 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.5

File hashes

Hashes for unimorph-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 5edba2fa8a617b1fe5f6463b60d1ce18c0a116e6699804dffa4164a5c514bb69
MD5 75b1d453eee1119dc5b5ace9f0591cdf
BLAKE2b-256 3c2a653f5cb0449b04019052f0f86fb9cdd89b55fa8b7e173ef91122641679ac

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page