Thai Natural Language Processing library
Project description
PyThaiNLP
PyThaiNLP is a Python library for natural language processing (NLP) of Thai language.
PyThaiNLP includes Thai word tokenizers, transliterators, soundex converters, part-of-speech taggers, and spell checkers.
📫 follow us on Facebook PyThaiNLP
What's new in 2.0 ?
- Terminate Python 2 support. Remove all Python 2 compatibility code.
- Improved
word_tokenize
("newmm" and "mm" engine) anddict_word_tokenize
- Improved Part-Of-Speech tagging
- New
NorvigSpellChecker
spell checker class, which can be initialized with custom dictionary. - New
thai2fit
(replacingthai2vec
, upgrade ULMFiT-related code to fastai 1.0) - Updated ThaiNER to 1.0
- You may need to update your existing ThaiNER models from PyThaiNLP 1.7
- Remove old, obsolated, deprecated, duplicated, and experimental code.
- Sentiment analysis is no longer part of the library, but rather a text classification example.
- See more examples in Get Started notebook
- Full change log
- Upgrading from 1.7
Install
For stable version:
pip install pythainlp
For some advanced functionalities, like word vector, extra packages may be needed. Install them with these options during pip install:
pip install pythainlp[extra1,extra2,...]
where extras can be
artagger
(to support artagger part-of-speech tagger)*deepcut
(to support deepcut machine-learnt tokenizer)icu
(for ICU support in transliteration and tokenization)ipa
(for International Phonetic Alphabet support in transliteration)ml
(to support fastai 1.0.22 ULMFiT models)ner
(for named-entity recognizer)thai2fit
(for Thai word vector)thai2rom
(for machine-learnt romanization)full
(install everything)
Note for Windows: marisa-trie
wheels can be obtained from https://www.lfd.uci.edu/~gohlke/pythonlibs/#marisa-trie
Install it with pip, for example: pip install marisa_trie‑0.7.5‑cp36‑cp36m‑win32.whl
Links
- User guide: English, ภาษาไทย
- Docs: https://thainlp.org/pythainlp/docs/2.0/
- GitHub: https://github.com/PyThaiNLP/pythainlp
- Issues: https://github.com/PyThaiNLP/pythainlp/issues
- Facebook: PyThaiNLP
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for pythainlp-2.0.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 28d70236a67975c1ce6c259762bca038a6fb4d86d9b96c42718bd5fbebc39bc1 |
|
MD5 | 1e04f2cd1dc2f53f8961e23691c30a6c |
|
BLAKE2b-256 | 345b8e6981607ee4a24d3ba178e988feba2f361299778da201e2fe916a6ca31e |