Simple Thai Preprocess Functions
Project description
th-preprocessor
Simple Thai Preprocess Functions
Objectives
This repository provides simple preprocess techniques for Thai sentences/phrases
Supports
The module supports Python 3.6+
Installation
pip install th-simple-preprocessor
How to Use
from th_preprocessor.preprocess import preprocess
text = '"::::: อย่างไรก็ตามนูร์ ฮิชัม อับดุลเลาะห์ 21-09-2018 https://www.malaysiakini.com/news/444015"'
words = preprocess(text)
print(words)
# อย่างไรก็ตามนูร์ ฮิชัม อับดุลเลาะห์ WSNUMBER WSNUMBER WSNUMBER WSLINK
Package reference:
th_preprocessor.preprocess.normalize_link
th_preprocessor.preprocess.normalize_at_mention
th_preprocessor.preprocess.normalize_email
th_preprocessor.preprocess.normalize_haha
th_preprocessor.preprocess.normalize_num
th_preprocessor.preprocess.normalize_phone
th_preprocessor.preprocess.normalize_accented_chars
th_preprocessor.preprocess.normalize_special_chars
th_preprocessor.preprocess.remove_hashtags
th_preprocessor.preprocess.remove_tag
th_preprocessor.preprocess.remove_dup_spaces
th_preprocessor.preprocess.remove_emoji
th_preprocessor.preprocess.replace_dup_chars
th_preprocessor.preprocess.replace_dup_emojis
th_preprocessor.preprocess.insert_spaces
th_preprocessor.preprocess.normalize_emoji
th_preprocessor.preprocess.remove_others_char
th_preprocessor.preprocess.remove_stopwords
th_preprocessor.preprocess.preprocess
Copyright
All licenses in this repository are copyrighted by their respective authors. Everything else is released under CC0. See LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for th-simple-preprocessor-0.9.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 325c48aae37b3856f46f8af87b7968190f66fb396751792917cfb6bd35c7fd28 |
|
MD5 | a90be517984df04552539e01402c9f2c |
|
BLAKE2b-256 | 45ee42b12ef9c07c92e56e4a510ae4b9c61ae084029dc8ee98ed14d930bb407d |
Close
Hashes for th_simple_preprocessor-0.9.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9ec479549679d5758f5d69b68ddc47ad4c8ec67db73246dc5a51725aa98916cd |
|
MD5 | adb6be6205cf63b00dda64d3d405a134 |
|
BLAKE2b-256 | 0a56acf1c806f49e7d5ecb416be22c2a5312c0a88a92657c425e31132f7d4ab0 |