Skip to main content

Python wrapper for the Yandex MyStem 3.1 morpholocial analyzer of the Russian language.

Project description

Build Status

Introduction

This module contains a wrapper for an excellent morphological analyzer for Russian language Yandex Mystem 3.1 released in June 2014. A morphological analyzer can perform lemmatization of text and derive a set of morphological attributes for each token. For more details about the algorithm see I. Segalovich «A fast morphological algorithm with unknown word guessing induced by a dictionary for a web search engine», MLMTA-2003, Las Vegas, Nevada, USA.

Python is the language of choice for many computational linguists, including those working with Russian language. The main motivation for this development was absence of any Python wrapper for the Mystem, a one of the most popular morphological analyzers for Russian language along with the PyMorphy2, the TreeTagger and AOT.

The third version of Mystem introduces several importaint improvements, most importaintly part-of-speech disambiguation. Our wrapper runs the Mystem in the mode which performs POS disambiguation.

This wrapper is open sources under MIT license. However, please consider that the Yandex Mystem is not open source and licensed under conditions of the Yandex License.

System Requrements

The wrapper works with CPython 2.6+/3.3+ and PyPy 1.9+.

The wrapper was tested on Ubuntu Linux 12.04+, Mac OSX 10.9+ and Windows 7+.

For 32bit architectures and freebsd platform support use ver. 0.1.10.

Installation

  1. Stable version: https://pypi.python.org/pypi/pymystem3. You can install it using pip:

    pip install pymystem3
  1. Latest version (recommended): https://github.com/nlpub/pymystem3:

    pip install git+https://github.com/nlpub/pymystem3

A Quick Example

Lemmatization

>>> from pymystem3 import Mystem
>>> text = "Красивая мама красиво мыла раму"
>>> m = Mystem()
>>> lemmas = m.lemmatize(text)
>>> print(''.join(lemmas))
красивый мама красиво мыть рама

Getting grammatical information and lemmas.

>>> import json
>>> from pymystem3 import Mystem

>>> text = "Красивая мама красиво мыла раму"
>>> m = Mystem()
>>> lemmas = m.lemmatize(text)

>>> print "lemmas:", ''.join(lemmas)
>>> print "full info:", json.dumps(m.analyze(text), ensure_ascii=False, encoding='utf8')

lemmas: красивый мама красиво мыть рама

full info: [{"text": "Красивая", "analysis": [{"lex": "красивый", "gr": "A=им,ед,полн,жен"}]}, {"text": " "}, {"text": "мама", "analysis": [{"lex": "мама", "gr": "S,жен,од=им,ед"}]}, {"text": " "}, {"text": "красиво", "analysis": [{"lex": "красиво", "gr": "ADV="}]}, {"text": " "}, {"text": "мыла", "analysis": [{"lex": "мыть", "gr": "V,несов,пе=прош,ед,изъяв,жен"}]}, {"text": " "}, {"text": "раму", "analysis": [{"lex": "рама", "gr": "S,жен,неод=вин,ед"}]}, {"text": "\n"}]

Issues

Please report any bugs or requests that you have using the GitHub issue tracker (https://github.com/nlpub/pymystem3/issues)! We have only very limited amount of resources to maintain this project: please propose a pull request directly if you see an obvious way of fixing the issue. We are very open to accepting bug fixes and your help is greatly appreciated.

Authors

The full list of contributors is listed by Github. You can also contact the original contributors of the project via email:

  • Denis Sukhonin (d.sukhonin): development

  • Alexander Panchenko (panchenko.alexander): conception

@ gmail

If you are interested in further developments or becoming a maintainter of this project please drop us an email: your help is greatly appreciated.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pymystem3-0.2.0.tar.gz (21.6 kB view details)

Uploaded Source

Built Distribution

pymystem3-0.2.0-py3-none-any.whl (10.9 kB view details)

Uploaded Python 3

File details

Details for the file pymystem3-0.2.0.tar.gz.

File metadata

  • Download URL: pymystem3-0.2.0.tar.gz
  • Upload date:
  • Size: 21.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for pymystem3-0.2.0.tar.gz
Algorithm Hash digest
SHA256 1aaed6a15451cc73f5750bcda77559e681215e1e391b35dd4325bd132a3afb95
MD5 98b293cc40a497543b3d3ce7c7c5a07a
BLAKE2b-256 4ec256486b7eb180d83363baa2e609fcc2d49280331ad67e348cf8fa456050bf

See more details on using hashes here.

File details

Details for the file pymystem3-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for pymystem3-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0bfdb0c2c57157f413c85f3e5f54a41f294880c3d6e8028066e71598ee6b52aa
MD5 2ff2d95a9c352cd8eb1e4697b88527aa
BLAKE2b-256 008c98b43c5822620458704e187a1666616c1e21a846ede8ffda493aabe11207

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page