Thai Natural Language Processing library
Project description
PyThaiNLP
PyThaiNLP is a Python library for natural language processing (NLP) of Thai language.
PyThaiNLP includes Thai word tokenizers, transliterators, soundex converters, part-of-speech taggers, and spell checkers.
📫 follow us on Facebook PyThaiNLP
What's new in 2.1 ?
- Improved
word_tokenize
("newmm" and "mm" engine), acustom_dict
dictionary can be provided - Add AttaCut to be options for
word_tokenize
engine. - New Thai2rom (PyTorch)
- New Command Line
- Add word tokenization benchmark to PyThaiNLP
- See more examples in Get Started notebook
- Full change log
Install
For stable version:
pip install pythainlp
For some advanced functionalities, like word vector, extra packages may be needed. Install them with these options during pip install:
pip install pythainlp[extra1,extra2,...]
where extras can be
artagger
(to support artagger part-of-speech tagger)*attacut
- Wrapper for AttaCut (https://github.com/PyThaiNLP/attacut)deepcut
(to support deepcut machine-learnt tokenizer)icu
(for ICU support in transliteration and tokenization)ipa
(for International Phonetic Alphabet support in transliteration)ml
(to support fastai 1.0.22 ULMFiT models)ner
(for named-entity recognizer)thai2fit
(for Thai word vector)thai2rom
(for machine-learnt romanization)full
(install everything)
Note for Windows: marisa-trie
wheels can be obtained from https://www.lfd.uci.edu/~gohlke/pythonlibs/#marisa-trie
Install it with pip, for example: pip install marisa_trie‑0.7.5‑cp36‑cp36m‑win32.whl
Links
- User guide: English
- Docs: https://thainlp.org/pythainlp/docs/2.1/
- GitHub: https://github.com/PyThaiNLP/pythainlp
- Issues: https://github.com/PyThaiNLP/pythainlp/issues
- Facebook: PyThaiNLP
Made with ❤️
We build Thai NLP.
PyThaiNLP Team.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Hashes for pythainlp-2.1.dev4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a863c87cc4b150343f461e6b3076b6e7193f5dea474934e194b9656dfeb2bec1 |
|
MD5 | d983e5f07ed79eaebd777897880cfa3f |
|
BLAKE2b-256 | 4a5ef8af4fc5d3f927782d3566127a225ba85da1314c36045719939139b5796c |