Python morphological analyzer and lemmatizer for Turkish
Project description
Zeyrek is a partial port of Zemberek library to Python for lemmatizing and analyzing Turkish language words. It is in alpha stage, and the API will probably change.
Free software: MIT license
Documentation: https://zeyrek.readthedocs.io.
Basic Usage
To use Zeyrek, first create an instance of MorphAnalyzer class:
>>> import zeyrek >>> analyzer = zeyrek.MorphAnalyzer()
Then, you can call its analyze method on words or texts to get all possible analyses:
>>> print(analyzer.analyze('benim')) Parse(word='benim', lemma='ben', pos='Noun', morphemes=['Noun', 'A3sg', 'P1sg'], formatted='[ben:Noun] ben:Noun+A3sg+im:P1sg') Parse(word='benim', lemma='ben', pos='Pron', morphemes=['Pron', 'A1sg', 'Gen'], formatted='[ben:Pron,Pers] ben:Pron+A1sg+im:Gen') Parse(word='benim', lemma='ben', pos='Verb', morphemes=['Noun', 'A3sg', 'Zero', 'Verb', 'Pres', 'A1sg'], formatted='[ben:Noun] ben:Noun+A3sg|Zero→Verb+Pres+im:A1sg') Parse(word='benim', lemma='ben', pos='Verb', morphemes=['Pron', 'A1sg', 'Zero', 'Verb', 'Pres', 'A1sg'], formatted='[ben:Pron,Pers] ben:Pron+A1sg|Zero→Verb+Pres+im:A1sg')
If you only need the base form of words, or lemmas, you can call lemmatize. It returns a list of tuples, with word itself and a list of possible lemmas:
>>> print(analyzer.lemmatize('benim')) [('benim', ['ben'])]
Credits
This package is a Python port of part of the Zemberek package by Ahmet A. Akın
This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Hashes for zeyrek-0.1.0-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4de64aa159feebc54ca9c32ede1d8d68d8cecce3e4ccd8b9672b2d2b8d4a9ab2 |
|
MD5 | edcd66948a55de5a1e111cfe992b10e9 |
|
BLAKE2b-256 | eeb8aee47177d9bbcda8165469ff206be8be3f45615dc0c9c823523abadd42c8 |