Python morphological analyzer and lemmatizer for Turkish
Project description
Zeyrek is a partial port of Zemberek library to Python for lemmatizing and analyzing Turkish language words. It is in alpha stage, and the API will probably change.
Free software: MIT license
Documentation: https://zeyrek.readthedocs.io.
Basic Usage
To use Zeyrek, first create an instance of MorphAnalyzer class:
>>> import zeyrek >>> analyzer = zeyrek.MorphAnalyzer()
Then, you can call its analyze method on words or texts to get all possible analyses:
>>> print(analyzer.analyze('benim')) Parse(word='benim', lemma='ben', pos='Noun', morphemes=['Noun', 'A3sg', 'P1sg'], formatted='[ben:Noun] ben:Noun+A3sg+im:P1sg') Parse(word='benim', lemma='ben', pos='Pron', morphemes=['Pron', 'A1sg', 'Gen'], formatted='[ben:Pron,Pers] ben:Pron+A1sg+im:Gen') Parse(word='benim', lemma='ben', pos='Verb', morphemes=['Noun', 'A3sg', 'Zero', 'Verb', 'Pres', 'A1sg'], formatted='[ben:Noun] ben:Noun+A3sg|Zero→Verb+Pres+im:A1sg') Parse(word='benim', lemma='ben', pos='Verb', morphemes=['Pron', 'A1sg', 'Zero', 'Verb', 'Pres', 'A1sg'], formatted='[ben:Pron,Pers] ben:Pron+A1sg|Zero→Verb+Pres+im:A1sg')
If you only need the base form of words, or lemmas, you can call lemmatize. It returns a list of tuples, with word itself and a list of possible lemmas:
>>> print(analyzer.lemmatize('benim')) [('benim', ['ben'])]
Credits
This package is a Python port of part of the Zemberek package by Ahmet A. Akın
This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for zeyrek-0.1.3-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 23649bb49322a52d1e94959029b047fa4037bc540762819feb1096aa976b25b5 |
|
MD5 | 6b598cd65ee8cbca6275d6acb65f9201 |
|
BLAKE2b-256 | 7f5b76970fab035d2e2649ba06af037c394b9690d0e07a8faf53817ccccb3951 |