Skip to main content

Turkish NLP library

Project description

nlpTurk - Turkish NLP library

nlpTurk is an open source Turkish NLP library consisting of machine learning based sentence boundary detection, lemmatization and POS tagging models.

Installation & Usage

nlpTurk can be installed from PyPI.

pip install nlpturk

nlpTurk offers a simple API to extract sentences, lemmas and POS tags.

import nlpturk

text = "Sosyal medya hayatımıza hızlı girdi.ama yazım kurallarına dikkat eden pek yok :)"
doc = nlpturk(text)

# iterate over tokens
for token in doc:
    print(f"token: {token.text}, lemma: {token.lemma}, pos: {token.pos}")

"""
Prints:
  token: Sosyal, lemma: sosyal, pos: ADJ
  token: medya, lemma: medya, pos: NOUN
  ...
"""

# or get tokens by token ids
token = doc[5]
print(f"token: {token.text}, sent_start: {token.is_sent_start}, sent_end: {token.is_sent_end}")
token = doc[6]
print(f"token: {token.text}, sent_start: {token.is_sent_start}, sent_end: {token.is_sent_end}")

"""
Prints:
  token: ., sent_start: False, sent_end: True
  token: ama, sent_start: True, sent_end: False
"""

# iterate over sentences
for i, sent in enumerate(doc.sents):
    print(f"sentence #{i+1}: {sent.text}")
    for token in sent:
        print(f"  token: {token.text}, lemma: {token.lemma}, pos: {token.pos}")

"""
Prints:
  sentence #1: Sosyal medya hayatımıza hızlı girdi.
    token: Sosyal, lemma: sosyal, pos: ADJ
    ...
  sentence #2: ama yazım kurallarına dikkat eden pek yok :)
    token: ama, lemma: ama, pos: CCONJ
    ...
"""

Performance

The evaluation was performed on test dataset. Detailed evaluation and benchmarking results can be found here.

accuracy precision recall f1-score
Sentence Segmenter - 98.09 96.05 97.06
POS Tagger - 95.75 96.26 96.01
Lemmatizer 96.87 - - -


You can perform benchmarking on your own dataset.

git clone https://github.com/nlpturk/nlpturk.git
cd nlpturk
pip install -r requirements.txt
python -m nlpturk benchmark --data_path path/to/data --output_path path/to/output

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nlpturk-0.0.2.tar.gz (20.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nlpturk-0.0.2-py3-none-any.whl (20.9 kB view details)

Uploaded Python 3

File details

Details for the file nlpturk-0.0.2.tar.gz.

File metadata

  • Download URL: nlpturk-0.0.2.tar.gz
  • Upload date:
  • Size: 20.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.10

File hashes

Hashes for nlpturk-0.0.2.tar.gz
Algorithm Hash digest
SHA256 97cf9554a6aa4813dece6724a8818c4104e98e68a0d553b082420b7d408c5eda
MD5 a13f745e266bbcd5ffdcbf116dec6e49
BLAKE2b-256 4852df6cd425dfeb2ab31d1a75710fe7fb78c83b515fee5558def4e4ded74fd2

See more details on using hashes here.

File details

Details for the file nlpturk-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: nlpturk-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 20.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.10

File hashes

Hashes for nlpturk-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 4ef2390add0b28c2319841d454b6040d814b1f85fe2e811707e3020a6df15f04
MD5 5257093d5fa23abcc7c16692c28f1eda
BLAKE2b-256 55987e01eafbe2350581599676d67e332feacd8094e7daf4ccb7ce505bacf877

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page