Skip to main content

Smart tool for morphological analysis

Project description

Morphy

Morphy is a Python library for morphological analysis. Presents a set of simple interfaces for segmentation, tokenization, lemmatization, and text filtering. Based on nltk, spacy and pymorphy2.

Features

  • Fully supported multilanguage support (English, German, Spanish, Portuguese, French, Italian, Dutch, Russian)
  • Part-of-speech tagging
  • Sentence segmentation
  • Named entity recognition
  • Dependency parsing
  • Flexible customizability
  • Caching

Usage

Language detection

from morphy import Language
text = 'Lorem Ipsum is simply dummy text of the printing and typesetting industry'
lang = Language(text=text)
print(lang)

Sentence segmentation

from morphy import MultiLang
text = 'Lorem Ipsum is simply dummy text of the printing and typesetting industry'
english_proc = MultiLang(lang='en')
doc = english_proc(text)
for sent in doc.sentences:
    print('%s\n%s' % (sent, '\n'.join(str(sent.tokens))))

Lemmatization

from morphy import MultiLang
text = 'Lorem Ipsum is simply dummy text of the printing and typesetting industry'
english_proc = MultiLang(lang='en')
doc = english_proc(text)
for token in doc.tokens:
    print('%s --> %s' % (token.text, token.lemma))

Installation

Option 1: Via PyPi

Using pip, morphy releases are available as source packages and binary wheels (as of v0.1.0).

pip install morphy

When using pip it is generally recommended to install packages in a virtual environment to avoid modifying system state:

python -m virtualenv venv
source venv/bin/activate
pip install morphy

Option 2: Source Via Git

git clone git@bitbucket.org:igor_ezersky/morphy.git
cd morphy
python -m virtualenv venv
source venv/bin/activate
python setup.py install

Option 3: Source Zip

Download a zip of the code via GitHub or PyPi. Then follow the same instructions in option 2.

IMPORTANT

After package was installed, it is necessary to download nltk and spacy data.

python -c "import nltk; nltk.download('punkt')"
python -m spacy download en
python -m spacy download xx
# the line above should be repeated for each language that you need

You can specify which spacy model would you like to install, check their documentation.

Requirements

Notes

If you are using Windows there can be some errors while installing morphy requirements (e.g. ujson, cytoolz):

error: command 'cl.exe' failed: No such file or directory

Manual installation from compiled binaries of this two packages can be a solution. You can find them at this unofficial Python distributive repo.

Current limitations

  1. Installing spacy models for each language is required.
  2. Downloading nltk tokenizer data is required.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

morphy-0.2.tar.gz (9.3 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page