Skip to main content

Thai Natural Language Processing library

Project description

PyThaiNLP Logo

PyThaiNLP

PyThaiNLP is a Python library for natural language processing (NLP) of Thai language.

PyThaiNLP includes Thai word tokenizers, transliterators, soundex converters, part-of-speech taggers, and spell checkers.

📫 follow us on Facebook PyThaiNLP

What's new in 2.0 ?

  • Terminate Python 2 support. Remove all Python 2 compatibility code.
  • Improved word_tokenize ("newmm" and "mm" engine), a custom_dict dictionary can be provided
  • Improved pos_tag Part-Of-Speech tagging
  • New NorvigSpellChecker spell checker class, which can be initialized with custom dictionary.
  • New thai2fit (replacing thai2vec, upgrade ULMFiT-related code to fastai 1.0)
  • Updated ThaiNER to 1.0
  • Remove old, obsolated, deprecated, duplicated, and experimental code.
  • See more examples in Get Started notebook
  • Full change log
  • Upgrading from 1.7

Install

For stable version:

pip install pythainlp

For some advanced functionalities, like word vector, extra packages may be needed. Install them with these options during pip install:

pip install pythainlp[extra1,extra2,...]

where extras can be

  • artagger (to support artagger part-of-speech tagger)*
  • deepcut (to support deepcut machine-learnt tokenizer)
  • icu (for ICU support in transliteration and tokenization)
  • ipa (for International Phonetic Alphabet support in transliteration)
  • ml (to support fastai 1.0.22 ULMFiT models)
  • ner (for named-entity recognizer)
  • thai2fit (for Thai word vector)
  • thai2rom (for machine-learnt romanization)
  • full (install everything)

Note for Windows: marisa-trie wheels can be obtained from https://www.lfd.uci.edu/~gohlke/pythonlibs/#marisa-trie Install it with pip, for example: pip install marisa_trie‑0.7.5‑cp36‑cp36m‑win32.whl

Links

Project details


Release history Release notifications | RSS feed

This version

2.0.5

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pythainlp-2.0.5.tar.gz (55.7 kB view details)

Uploaded Source

Built Distribution

pythainlp-2.0.5-py3-none-any.whl (11.0 MB view details)

Uploaded Python 3

File details

Details for the file pythainlp-2.0.5.tar.gz.

File metadata

  • Download URL: pythainlp-2.0.5.tar.gz
  • Upload date:
  • Size: 55.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.19.1 setuptools/41.0.1 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.7.3

File hashes

Hashes for pythainlp-2.0.5.tar.gz
Algorithm Hash digest
SHA256 05fe7be2c307ff253878fd90dc1838a2a57278eea288b1688a997983c50cee38
MD5 f63760d3ef810f25b8dcfa0e5bdb2666
BLAKE2b-256 efac2f992e140cab1eaed6c70afe03a09c08d86876d7287d1813b0b2a2d0e74c

See more details on using hashes here.

File details

Details for the file pythainlp-2.0.5-py3-none-any.whl.

File metadata

  • Download URL: pythainlp-2.0.5-py3-none-any.whl
  • Upload date:
  • Size: 11.0 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.19.1 setuptools/41.0.1 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.7.3

File hashes

Hashes for pythainlp-2.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 a980c75fce7ca3adc5505112a153392d5ee1976512ba223295ac49a23dd841bd
MD5 09634977e2b8517d0a35a9001d6f9885
BLAKE2b-256 91bbd4d1711d331c080ca29bce13f04590d06add379471fb9a1147a153073871

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page