Simple Thai Preprocess Functions
Project description
th-preprocessor
Simple Thai Preprocess Functions
Objectives
This repository provides simple preprocess techniques for Thai sentences/phrases
Supports
The module supports Python 3.6+
Installation
pip install th-simple-preprocessor
How to Use
from th_preprocessor.preprocess import preprocess
text = '"::::: อย่างไรก็ตามนูร์ ฮิชัม อับดุลเลาะห์ 21-09-2018 https://www.malaysiakini.com/news/444015"'
words = preprocess(text)
print(words)
# อย่างไรก็ตามนูร์ ฮิชัม อับดุลเลาะห์ WSNUMBER WSNUMBER WSNUMBER WSLINK
Package reference:
- th_preprocessor.preprocess.normalize_link
- th_preprocessor.preprocess.normalize_at_mention
- th_preprocessor.preprocess.normalize_email
- th_preprocessor.preprocess.normalize_haha
- th_preprocessor.preprocess.normalize_num
- th_preprocessor.preprocess.normalize_phone
- th_preprocessor.preprocess.normalize_accented_chars
- th_preprocessor.preprocess.normalize_special_chars
- th_preprocessor.preprocess.remove_hashtags
- th_preprocessor.preprocess.remove_tag
- th_preprocessor.preprocess.remove_dup_spaces
- th_preprocessor.preprocess.remove_emoji
- th_preprocessor.preprocess.normalize_emoji
- th_preprocessor.preprocess.remove_others_char
- th_preprocessor.preprocess.remove_stopwords
- th_preprocessor.preprocess.preprocess
Copyright
All licenses in this repository are copyrighted by their respective authors. Everything else is released under CC0. See LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for th-simple-preprocessor-0.7.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | f493c1d931cad4156596615ca34ee741b769fa25f8ae8dd5e83a3dfa8b3d9b0f |
|
MD5 | aec0f3dd15b4be49dbe0f49ab73ec1fc |
|
BLAKE2b-256 | 07bff17bccf1bbe14a471d17df02c5f93c9586d3cd395d839f373ab91cb616de |
Close
Hashes for th_simple_preprocessor-0.7.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b5ae705a23ffbb508acb2b015fcb3f2294a8f7a10cae67a1f8e0244445866f7d |
|
MD5 | d7b91564a75ad87c52d61547a2c29343 |
|
BLAKE2b-256 | 5bbb8812591cbd75970904b766ead49dbce0058fd1106994e5ceb5c9935ed44c |