Skip to main content

Thai Natural Language Processing library

Project description

PyThaiNLP Logo

PyThaiNLP 2.0

Codacy Badgepypi Build Status Build status Coverage Status License

PyThaiNLP is a Python library for natural language processing (NLP) of Thai language.

PyThaiNLP includes Thai word tokenizers, transliterators, soundex converters, part-of-speech taggers, and spell checkers.

📖 For details on upgrading from PyThaiNLP 1.7 to PyThaiNLP 2.0, see From PyThaiNLP 1.7 to PyThaiNLP 2.0

📖 For ThaiNER user after upgrading from PyThaiNLP 1.7 to PyThaiNLP 2.0, see Upgrade ThaiNER from PyThaiNLP 1.7 to PyThaiNLP 2.0

📫 follow us on Facebook Pythainlp

What's new in version 2.0 ?

  • New NorvigSpellChecker spell checker class, which can be initialized with custom dictionary.
  • Terminate Python 2 support. Remove all Python 2 compatibility code.
  • Remove old, obsolated, deprecated, and experimental code.
  • Thai2fit (Upgrade ULMFiT-related codes to fastai 1.0)
  • ThaiNER 1.0
  • Remove sentiment analysis
  • Improved word_tokenize (newmm, mm) and dict_word_tokenize
  • Improved POS-tagging
  • More and improved examples
  • see PyThaiNLP 2.0 change log

Install

For stable version:

pip install pythainlp

For some advanced functionalities, like word vector, extra packages may be needed. Install them with these options during pip install:

pip install pythainlp[extra1,extra2,...]

where extras can be

  • artagger (to support artagger part-of-speech tagger)*
  • deepcut (to support deepcut machine-learnt tokenizer)
  • icu (for ICU support in transliteration and tokenization)
  • ipa (for International Phonetic Alphabet support in transliteration)
  • ml (to support fastai 1.0.22 ULMFiT models)
  • ner (for named-entity recognizer)
  • thai2fit (for Thai word vector)
  • thai2rom (for machine-learnt romanization)
  • full (install everything)

Note for Windows: marisa-trie wheels can be obtained from https://www.lfd.uci.edu/~gohlke/pythonlibs/#marisa-trie Install it with pip, for example: pip install marisa_trie‑0.7.5‑cp36‑cp36m‑win32.whl

Links

Project details


Release history Release notifications | RSS feed

This version

2.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

pythainlp-2.0-py3-none-any.whl (17.7 MB view details)

Uploaded Python 3

File details

Details for the file pythainlp-2.0-py3-none-any.whl.

File metadata

  • Download URL: pythainlp-2.0-py3-none-any.whl
  • Upload date:
  • Size: 17.7 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.19.1 setuptools/39.1.0 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.6.7

File hashes

Hashes for pythainlp-2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fa3cb53b5ee2741eeb7c1b9b282a9d76ee958d1ec0e7f428d99ad0e280c4e01f
MD5 1815068ccaa42c865ac715e91e08144f
BLAKE2b-256 ee6e90fb60999bedf05b6e14d9e3a050e8e7e58f414e93e7c4ad51f7b02a3abf

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page