Thai Natural Language Processing library
PyThaiNLP is a Python library for natural language processing (NLP) of Thai language.
PyThaiNLP includes Thai word tokenizers, transliterators, soundex converters, part-of-speech taggers, and spell checkers.
📫 follow us on Facebook PyThaiNLP
What's new in version 2.0 ?
- New NorvigSpellChecker spell checker class, which can be initialized with custom dictionary.
- Terminate Python 2 support. Remove all Python 2 compatibility code.
- Remove old, obsolated, deprecated, and experimental code.
- Thai2fit (Upgrade ULMFiT-related codes to fastai 1.0)
- ThaiNER 1.0
- Remove sentiment analysis
- Improved word_tokenize (newmm, mm) and dict_word_tokenize
- Improved POS-tagging
- See examples in Get Started notebook
- Full change log
- Upgrading from 1.7
- Upgrade ThaiNER from 1.7
For stable version:
pip install pythainlp
For some advanced functionalities, like word vector, extra packages may be needed. Install them with these options during pip install:
pip install pythainlp[extra1,extra2,...]
where extras can be
artagger(to support artagger part-of-speech tagger)*
deepcut(to support deepcut machine-learnt tokenizer)
icu(for ICU support in transliteration and tokenization)
ipa(for International Phonetic Alphabet support in transliteration)
ml(to support fastai 1.0.22 ULMFiT models)
ner(for named-entity recognizer)
thai2fit(for Thai word vector)
thai2rom(for machine-learnt romanization)
Note for Windows:
marisa-trie wheels can be obtained from https://www.lfd.uci.edu/~gohlke/pythonlibs/#marisa-trie
Install it with pip, for example:
pip install marisa_trie‑0.7.5‑cp36‑cp36m‑win32.whl
Release history Release notifications
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
|Filename, size & hash SHA256 hash help||File type||Python version||Upload date|
|pythainlp-2.0.3-py3-none-any.whl (11.2 MB) Copy SHA256 hash SHA256||Wheel||py3|
|pythainlp-2.0.3.tar.gz (53.9 kB) Copy SHA256 hash SHA256||Source||None|