Thai Natural Language Processing library
Project description
PyThaiNLP
PyThaiNLP is a Python library for natural language processing (NLP) of Thai language.
PyThaiNLP includes Thai word tokenizers, transliterators, soundex converters, part-of-speech taggers, and spell checkers.
📫 follow us on Facebook PyThaiNLP
What's new in 2.0 ?
- Terminate Python 2 support. Remove all Python 2 compatibility code.
- Improved
word_tokenize
("newmm" and "mm" engine), acustom_dict
dictionary can be provided - Improved
pos_tag
Part-Of-Speech tagging - New
NorvigSpellChecker
spell checker class, which can be initialized with custom dictionary. - New
thai2fit
(replacingthai2vec
, upgrade ULMFiT-related code to fastai 1.0) - Updated ThaiNER to 1.0
- You may need to update your existing ThaiNER models from PyThaiNLP 1.7
- Remove old, obsolated, deprecated, duplicated, and experimental code.
- Sentiment analysis is no longer part of the library, but rather a text classification example.
- See more examples in Get Started notebook
- Full change log
- Upgrading from 1.7
Install
For stable version:
pip install pythainlp
For some advanced functionalities, like word vector, extra packages may be needed. Install them with these options during pip install:
pip install pythainlp[extra1,extra2,...]
where extras can be
artagger
(to support artagger part-of-speech tagger)*deepcut
(to support deepcut machine-learnt tokenizer)icu
(for ICU support in transliteration and tokenization)ipa
(for International Phonetic Alphabet support in transliteration)ml
(to support fastai 1.0.22 ULMFiT models)ner
(for named-entity recognizer)thai2fit
(for Thai word vector)thai2rom
(for machine-learnt romanization)full
(install everything)
Note for Windows: marisa-trie
wheels can be obtained from https://www.lfd.uci.edu/~gohlke/pythonlibs/#marisa-trie
Install it with pip, for example: pip install marisa_trie‑0.7.5‑cp36‑cp36m‑win32.whl
Links
- User guide: English, ภาษาไทย
- Docs: https://thainlp.org/pythainlp/docs/2.0/
- GitHub: https://github.com/PyThaiNLP/pythainlp
- Issues: https://github.com/PyThaiNLP/pythainlp/issues
- Facebook: PyThaiNLP
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Hashes for pythainlp-2.0.6-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e7d44958a12cc91031f19832d7feaa9dba7c0c1d97c193eb8fd461b1ee969411 |
|
MD5 | c43d6cb7967fbc0e97aad42611aea0b3 |
|
BLAKE2b-256 | a675d9a623ddcadf6b729bd6cc425667ca8d3554baaf6c71ad31649d4454edb9 |