Skip to main content

Smart tool for morphological analysis

Project description

Morphy

Morphy is a Python library for morphological analysis. Presents a set of simple interfaces for segmentation, tokenization, lemmatization, and text filtering. Based on nltk, spacy and pymorphy2.

Features

  • Fully supported multilanguage support (English, German, Spanish, Portuguese, French, Italian, Dutch, Russian)
  • Part-of-speech tagging
  • Sentence segmentation
  • Named entity recognition
  • Dependency parsing
  • Flexible customizability
  • Caching

Usage

Language detection

from morphy import Language
text = 'Lorem Ipsum is simply dummy text of the printing and typesetting industry'
lang = Language(text=text)
print(lang)

Sentence segmentation

from morphy import MultiLang
text = 'Lorem Ipsum is simply dummy text of the printing and typesetting industry'
english_proc = MultiLang(lang='en')
doc = english_proc(text)
for sent in doc.sentences:
    print('%s\n%s' % (sent, '\n'.join(str(sent.tokens))))

Lemmatization

from morphy import MultiLang
text = 'Lorem Ipsum is simply dummy text of the printing and typesetting industry'
english_proc = MultiLang(lang='en')
doc = english_proc(text)
for token in doc.tokens:
    print('%s --> %s' % (token.text, token.lemma))

Installation

Option 1: Via PyPi

Using pip, morphy releases are available as source packages and binary wheels (as of v0.1.0).

pip install morphy

When using pip it is generally recommended to install packages in a virtual environment to avoid modifying system state:

python -m virtualenv venv
source venv/bin/activate
pip install morphy

Option 2: Source Via Git

git clone git@bitbucket.org:igor_ezersky/morphy.git
cd morphy
python -m virtualenv venv
source venv/bin/activate
python setup.py install

Option 3: Source Zip

Download a zip of the code via GitHub or PyPi. Then follow the same instructions in option 2.

IMPORTANT

After package was installed, it is necessary to download nltk and spacy data.

python -c "import nltk; nltk.download('punkt')"
python -m spacy download en
python -m spacy download xx
# the line above should be repeated for each language that you need

You can specify which spacy model would you like to install, check their documentation.

Requirements

Notes

If you are using Windows there can be some errors while installing morphy requirements (e.g. ujson, cytoolz):

error: command 'cl.exe' failed: No such file or directory

Manual installation from compiled binaries of this two packages can be a solution. You can find them at this unofficial Python distributive repo.

Current limitations

  1. Installing spacy models for each language is required.
  2. Downloading nltk tokenizer data is required.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

morphy-0.2.tar.gz (9.3 kB view details)

Uploaded Source

File details

Details for the file morphy-0.2.tar.gz.

File metadata

  • Download URL: morphy-0.2.tar.gz
  • Upload date:
  • Size: 9.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.6.2 requests-toolbelt/0.8.0 tqdm/4.29.1 CPython/3.7.2rc1

File hashes

Hashes for morphy-0.2.tar.gz
Algorithm Hash digest
SHA256 0bd8a29aaf6153058caaac487c06afdcd13f9c46d968a2f61a97fdc0ce6dfdc7
MD5 7172513ba078243d357226852d2ab7b1
BLAKE2b-256 eb4be3dee0696eef3bac39eb6aeb0e50b41cbcf9e50dd4b9fb1961a1fae71c9e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page