Skip to main content

Thai Natural Language Processing library

Project description

PyThaiNLP Logo

PyThaiNLP 2.0.3

PyThaiNLP is a Python library for natural language processing (NLP) of Thai language.

PyThaiNLP includes Thai word tokenizers, transliterators, soundex converters, part-of-speech taggers, and spell checkers.

📫 follow us on Facebook PyThaiNLP

What's new in version 2.0 ?

  • New NorvigSpellChecker spell checker class, which can be initialized with custom dictionary.
  • Terminate Python 2 support. Remove all Python 2 compatibility code.
  • Remove old, obsolated, deprecated, and experimental code.
  • Thai2fit (Upgrade ULMFiT-related codes to fastai 1.0)
  • ThaiNER 1.0
  • Remove sentiment analysis
  • Improved word_tokenize (newmm, mm) and dict_word_tokenize
  • Improved POS-tagging
  • See examples in Get Started notebook
  • Full change log
  • Upgrading from 1.7
  • Upgrade ThaiNER from 1.7

Install

For stable version:

pip install pythainlp

For some advanced functionalities, like word vector, extra packages may be needed. Install them with these options during pip install:

pip install pythainlp[extra1,extra2,...]

where extras can be

  • artagger (to support artagger part-of-speech tagger)*
  • deepcut (to support deepcut machine-learnt tokenizer)
  • icu (for ICU support in transliteration and tokenization)
  • ipa (for International Phonetic Alphabet support in transliteration)
  • ml (to support fastai 1.0.22 ULMFiT models)
  • ner (for named-entity recognizer)
  • thai2fit (for Thai word vector)
  • thai2rom (for machine-learnt romanization)
  • full (install everything)

Note for Windows: marisa-trie wheels can be obtained from https://www.lfd.uci.edu/~gohlke/pythonlibs/#marisa-trie Install it with pip, for example: pip install marisa_trie‑0.7.5‑cp36‑cp36m‑win32.whl

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
pythainlp-2.0.3-py3-none-any.whl (11.2 MB) Copy SHA256 hash SHA256 Wheel py3
pythainlp-2.0.3.tar.gz (53.9 kB) Copy SHA256 hash SHA256 Source None

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page