Smart tool for morphological analysis
Project description
Morphy
Morphy
is a Python library for morphological analysis. Presents a set of simple interfaces for segmentation,
tokenization, lemmatization, and text filtering. Based on nltk
, spacy
and pymorphy2
.
Features
- Fully supported multilanguage support (English, German, Spanish, Portuguese, French, Italian, Dutch, Russian)
- Part-of-speech tagging
- Sentence segmentation
- Named entity recognition
- Dependency parsing
- Flexible customizability
- Caching
Usage
Language detection
from morphy import Language
text = 'Lorem Ipsum is simply dummy text of the printing and typesetting industry'
lang = Language(text=text)
print(lang)
Sentence segmentation
from morphy import MultiLang
text = 'Lorem Ipsum is simply dummy text of the printing and typesetting industry'
english_proc = MultiLang(lang='en')
doc = english_proc(text)
for sent in doc.sentences:
print('%s\n%s' % (sent, '\n'.join(str(sent.tokens))))
Lemmatization
from morphy import MultiLang
text = 'Lorem Ipsum is simply dummy text of the printing and typesetting industry'
english_proc = MultiLang(lang='en')
doc = english_proc(text)
for token in doc.tokens:
print('%s --> %s' % (token.text, token.lemma))
Installation
Option 1: Via PyPi
Using pip, morphy
releases are available as source packages and binary wheels (as of v0.1.0).
pip install morphy
When using pip it is generally recommended to install packages in a virtual environment to avoid modifying system state:
python -m virtualenv venv
source venv/bin/activate
pip install morphy
Option 2: Source Via Git
git clone git@bitbucket.org:igor_ezersky/morphy.git
cd morphy
python -m virtualenv venv
source venv/bin/activate
python setup.py install
Option 3: Source Zip
Download a zip of the code via GitHub or PyPi. Then follow the same instructions in option 2.
IMPORTANT
After package was installed, it is necessary to download nltk
and spacy
data.
python -c "import nltk; nltk.download('punkt')"
python -m spacy download en
python -m spacy download xx
# the line above should be repeated for each language that you need
You can specify which spacy
model would you like to install, check their documentation.
Requirements
- Python 3.3+
spacy
nltk
cached_property
langdetect
Notes
If you are using Windows there can be some errors while installing morphy
requirements (e.g. ujson
, cytoolz
):
error: command 'cl.exe' failed: No such file or directory
Manual installation from compiled binaries of this two packages can be a solution. You can find them at this unofficial Python distributive repo.
Current limitations
- Installing
spacy
models for each language is required. - Downloading
nltk
tokenizer data is required.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.