Simple Thai Preprocess Functions
Project description
th-preprocessor
Simple Thai Preprocess Functions
Objectives
This repository provides simple preprocess techniques for Thai sentences/phrases
Supports
The module supports Python 3.6+
Installation
pip install th-simple-preprocessor
How to Use
from th_preprocessor.preprocess import preprocess
text = '"::::: อย่างไรก็ตามนูร์ ฮิชัม อับดุลเลาะห์ 21-09-2018 https://www.malaysiakini.com/news/444015"'
words = preprocess(text)
print(words)
# อย่างไรก็ตามนูร์ ฮิชัม อับดุลเลาะห์ WSNUMBER WSNUMBER WSNUMBER WSLINK
Package reference:
th_preprocessor.preprocess.normalize_link
th_preprocessor.preprocess.normalize_at_mention
th_preprocessor.preprocess.normalize_email
th_preprocessor.preprocess.normalize_haha
th_preprocessor.preprocess.normalize_num
th_preprocessor.preprocess.normalize_phone
th_preprocessor.preprocess.normalize_accented_chars
th_preprocessor.preprocess.normalize_special_chars
th_preprocessor.preprocess.remove_hashtags
th_preprocessor.preprocess.remove_tag
th_preprocessor.preprocess.remove_dup_spaces
th_preprocessor.preprocess.remove_emoji
th_preprocessor.preprocess.replace_dup_chars
th_preprocessor.preprocess.replace_dup_emojis
th_preprocessor.preprocess.insert_spaces
th_preprocessor.preprocess.normalize_emoji
th_preprocessor.preprocess.remove_others_char
th_preprocessor.preprocess.remove_stopwords
th_preprocessor.preprocess.preprocess
Copyright
All licenses in this repository are copyrighted by their respective authors. Everything else is released under CC0. See LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for th-simple-preprocessor-0.10.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 00397d49b8ff04db56e09e4752818fe44f5c4637defa232095892253937ef4e8 |
|
MD5 | 8b25f879c1a78fcca2d3b227cb465a2d |
|
BLAKE2b-256 | 8032e8d61085bd43ac18163b944bccd6327af6055754941c7dc285b550107ea1 |
Close
Hashes for th_simple_preprocessor-0.10.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 56c09fb2166e7eb2e6dedb05d5c75401119d650cfaa4d6d2e9ab5e81889f329a |
|
MD5 | 5e4f86e245411657c6df1cec81ffca55 |
|
BLAKE2b-256 | 42397736b56927c15a8abcddb74cd73e8cd73b4963733b914c435df149b86322 |