Turkish NLP library
Project description
nlpTurk - Turkish NLP library
nlpTurk is an open source Turkish NLP library consisting of machine learning based sentence boundary detection, lemmatization and POS tagging models.
Installation & Usage
nlpTurk can be installed from PyPI.
pip install nlpturk
nlpTurk offers a simple API to extract sentences, lemmas and POS tags.
import nlpturk
text = "Sosyal medya hayatımıza hızlı girdi.ama yazım kurallarına dikkat eden pek yok :)"
doc = nlpturk(text)
# iterate over tokens
for token in doc:
print(f"token: {token.text}, lemma: {token.lemma}, pos: {token.pos}")
"""
Prints:
token: Sosyal, lemma: sosyal, pos: ADJ
token: medya, lemma: medya, pos: NOUN
...
"""
# or get tokens by token ids
token = doc[5]
print(f"token: {token.text}, sent_start: {token.is_sent_start}, sent_end: {token.is_sent_end}")
token = doc[6]
print(f"token: {token.text}, sent_start: {token.is_sent_start}, sent_end: {token.is_sent_end}")
"""
Prints:
token: ., sent_start: False, sent_end: True
token: ama, sent_start: True, sent_end: False
"""
# iterate over sentences
for i, sent in enumerate(doc.sents):
print(f"sentence #{i+1}: {sent.text}")
for token in sent:
print(f" token: {token.text}, lemma: {token.lemma}, pos: {token.pos}")
"""
Prints:
sentence #1: Sosyal medya hayatımıza hızlı girdi.
token: Sosyal, lemma: sosyal, pos: ADJ
...
sentence #2: ama yazım kurallarına dikkat eden pek yok :)
token: ama, lemma: ama, pos: CCONJ
...
"""
Performance
The evaluation was performed on test dataset. Detailed evaluation and benchmarking results can be found here.
| accuracy | precision | recall | f1-score | |
|---|---|---|---|---|
| Sentence Segmenter | - | 98.09 | 96.05 | 97.06 |
| POS Tagger | - | 95.75 | 96.26 | 96.01 |
| Lemmatizer | 96.87 | - | - | - |
You can perform benchmarking on your own dataset.
git clone https://github.com/nlpturk/nlpturk.git
cd nlpturk
pip install -r requirements.txt
python -m nlpturk benchmark --data_path path/to/data --output_path path/to/output
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nlpturk-0.0.2.tar.gz.
File metadata
- Download URL: nlpturk-0.0.2.tar.gz
- Upload date:
- Size: 20.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.8.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
97cf9554a6aa4813dece6724a8818c4104e98e68a0d553b082420b7d408c5eda
|
|
| MD5 |
a13f745e266bbcd5ffdcbf116dec6e49
|
|
| BLAKE2b-256 |
4852df6cd425dfeb2ab31d1a75710fe7fb78c83b515fee5558def4e4ded74fd2
|
File details
Details for the file nlpturk-0.0.2-py3-none-any.whl.
File metadata
- Download URL: nlpturk-0.0.2-py3-none-any.whl
- Upload date:
- Size: 20.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.8.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4ef2390add0b28c2319841d454b6040d814b1f85fe2e811707e3020a6df15f04
|
|
| MD5 |
5257093d5fa23abcc7c16692c28f1eda
|
|
| BLAKE2b-256 |
55987e01eafbe2350581599676d67e332feacd8094e7daf4ccb7ce505bacf877
|