Skip to main content

Durak: modular Turkish NLP preprocessing toolkit.

Project description

Durak

PyPI Python Versions Tests License

Durak logo

Durak is a Turkish natural language processing toolkit focused on reliable preprocessing building blocks. It offers configurable cleaning, tokenisation, stopword management, lemmatisation adapters, and frequency statistics so projects can bootstrap robust text pipelines quickly.

Quickstart

Install from PyPI:

pip install durak-nlp

Clean and tokenize Turkish text in seconds:

from durak import clean_text, tokenize

text = "Türkiye'de NLP zor mu? Durak kolaylaştırır."
tokens = tokenize(clean_text(text))
print(tokens)
# ['türkiye'de', 'nlp', 'zor', 'mu', '?', 'durak', 'kolaylaştırır', '.']

Features

  • Unicode-aware cleaning utilities tuned for Turkish content (social, news, informal text).
  • Configurable stopword management with keep-lists, custom additions, and serialization.
  • Regex-based tokenizer and sentence splitter with clitic and diacritic preservation.
  • Lightweight corpus validator to guard Turkish-specific artefacts.
  • Ready for extension with future lemmatization and subword adapters.

Development Setup

python -m venv .venv
source .venv/bin/activate
pip install -e .[dev]
pytest

Before submitting changes, run:

ruff check .
mypy src
pytest

Refer to CONTRIBUTING.md for the full workflow, coding standards, and release process. The project roadmap lives in ROADMAP.md, and notable changes are tracked in CHANGELOG.md.

Community & Support

License

Durak is distributed under the Durak License v1.1. Commercial or institutional use requires explicit written permission from the author.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

durak_nlp-0.1.1.tar.gz (14.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

durak_nlp-0.1.1-py3-none-any.whl (11.4 kB view details)

Uploaded Python 3

File details

Details for the file durak_nlp-0.1.1.tar.gz.

File metadata

  • Download URL: durak_nlp-0.1.1.tar.gz
  • Upload date:
  • Size: 14.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for durak_nlp-0.1.1.tar.gz
Algorithm Hash digest
SHA256 8c5fc04b6cb6a554b8768e290ce8c2eda984e157211bed281d72a97ea2aca40c
MD5 ad96269ab5331b253b52b192ece02b9c
BLAKE2b-256 9a85049474d1c61833fb64ef8826ab5a31862e59a1b9b74264d996091b699b1e

See more details on using hashes here.

File details

Details for the file durak_nlp-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: durak_nlp-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 11.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for durak_nlp-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a832079dc04d70cd27826455dc7160dda7f1f514c26a6a64cb4f82fa5b6ac6ac
MD5 5830884ec16a16cc2b20359d776a8bb8
BLAKE2b-256 affdaa03f3e3d31ea9644e7ca251586f4733045898d5086b9c95c50037994dc4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page