Skip to main content

Thai Natural Language Processing library

Project description

PyThaiNLP Logo

PyThaiNLP

PyThaiNLP is a Python library for natural language processing (NLP) of Thai language.

PyThaiNLP includes Thai word tokenizers, transliterators, soundex converters, part-of-speech taggers, and spell checkers.

📫 follow us on Facebook PyThaiNLP

What's new in 2.1 ?

  • Improved word_tokenize ("newmm" and "mm" engine), a custom_dict dictionary can be provided
  • Add AttaCut to be options for word_tokenize engine.
  • New Thai2rom (PyTorch)
  • New Command Line
  • Add word tokenization benchmark to PyThaiNLP
  • See more examples in Get Started notebook
  • Full change log

Install

For stable version:

pip install pythainlp

For some advanced functionalities, like word vector, extra packages may be needed. Install them with these options during pip install:

pip install pythainlp[extra1,extra2,...]

where extras can be

  • artagger (to support artagger part-of-speech tagger)*
  • attacut - Wrapper for AttaCut (https://github.com/PyThaiNLP/attacut)
  • deepcut (to support deepcut machine-learnt tokenizer)
  • icu (for ICU support in transliteration and tokenization)
  • ipa (for International Phonetic Alphabet support in transliteration)
  • ml (to support fastai 1.0.22 ULMFiT models)
  • ner (for named-entity recognizer)
  • thai2fit (for Thai word vector)
  • thai2rom (for machine-learnt romanization)
  • full (install everything)

Note for Windows: marisa-trie wheels can be obtained from https://www.lfd.uci.edu/~gohlke/pythonlibs/#marisa-trie Install it with pip, for example: pip install marisa_trie‑0.7.5‑cp36‑cp36m‑win32.whl

Links

Made with ❤️

We build Thai NLP.

PyThaiNLP Team.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

pythainlp-2.1.dev3-py3-none-any.whl (11.3 MB view details)

Uploaded Python 3

File details

Details for the file pythainlp-2.1.dev3-py3-none-any.whl.

File metadata

  • Download URL: pythainlp-2.1.dev3-py3-none-any.whl
  • Upload date:
  • Size: 11.3 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.3

File hashes

Hashes for pythainlp-2.1.dev3-py3-none-any.whl
Algorithm Hash digest
SHA256 037c25c2ef41babb85b0ae7c9485e9c76291a3af938de06fa10163c948a8d091
MD5 20fd85ed8507e5059a955aec47fa5f73
BLAKE2b-256 0f1935d6c9951f0b5a911a94630b6557c59dd1f19eeb91b4243787be060d75c1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page