Thai Natural Language Processing library
Project description
PyThaiNLP
PyThaiNLP is a Python library for natural language processing (NLP) of Thai language.
PyThaiNLP includes Thai word tokenizers, transliterators, soundex converters, part-of-speech taggers, and spell checkers.
📫 follow us on Facebook PyThaiNLP
What's new in 2.0 ?
- Terminate Python 2 support. Remove all Python 2 compatibility code.
- Improved
word_tokenize
("newmm" and "mm" engine), acustom_dict
dictionary can be provided - Improved
pos_tag
Part-Of-Speech tagging - New
NorvigSpellChecker
spell checker class, which can be initialized with custom dictionary. - New
thai2fit
(replacingthai2vec
, upgrade ULMFiT-related code to fastai 1.0) - Updated ThaiNER to 1.0
- You may need to update your existing ThaiNER models from PyThaiNLP 1.7
- Remove old, obsolated, deprecated, duplicated, and experimental code.
- Sentiment analysis is no longer part of the library, but rather a text classification example.
- See more examples in Get Started notebook
- Full change log
- Upgrading from 1.7
Install
For stable version:
pip install pythainlp
For some advanced functionalities, like word vector, extra packages may be needed. Install them with these options during pip install:
pip install pythainlp[extra1,extra2,...]
where extras can be
artagger
(to support artagger part-of-speech tagger)*deepcut
(to support deepcut machine-learnt tokenizer)icu
(for ICU support in transliteration and tokenization)ipa
(for International Phonetic Alphabet support in transliteration)ml
(to support fastai 1.0.22 ULMFiT models)ner
(for named-entity recognizer)thai2fit
(for Thai word vector)thai2rom
(for machine-learnt romanization)full
(install everything)
Note for Windows: marisa-trie
wheels can be obtained from https://www.lfd.uci.edu/~gohlke/pythonlibs/#marisa-trie
Install it with pip, for example: pip install marisa_trie‑0.7.5‑cp36‑cp36m‑win32.whl
Links
- User guide: English, ภาษาไทย
- Docs: https://thainlp.org/pythainlp/docs/2.0/
- GitHub: https://github.com/PyThaiNLP/pythainlp
- Issues: https://github.com/PyThaiNLP/pythainlp/issues
- Facebook: PyThaiNLP
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pythainlp-2.0.5.tar.gz
.
File metadata
- Download URL: pythainlp-2.0.5.tar.gz
- Upload date:
- Size: 55.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.19.1 setuptools/41.0.1 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 05fe7be2c307ff253878fd90dc1838a2a57278eea288b1688a997983c50cee38 |
|
MD5 | f63760d3ef810f25b8dcfa0e5bdb2666 |
|
BLAKE2b-256 | efac2f992e140cab1eaed6c70afe03a09c08d86876d7287d1813b0b2a2d0e74c |
File details
Details for the file pythainlp-2.0.5-py3-none-any.whl
.
File metadata
- Download URL: pythainlp-2.0.5-py3-none-any.whl
- Upload date:
- Size: 11.0 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.19.1 setuptools/41.0.1 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a980c75fce7ca3adc5505112a153392d5ee1976512ba223295ac49a23dd841bd |
|
MD5 | 09634977e2b8517d0a35a9001d6f9885 |
|
BLAKE2b-256 | 91bbd4d1711d331c080ca29bce13f04590d06add379471fb9a1147a153073871 |