Simple Thai Preprocess Functions
Project description
th-preprocessor
Simple Thai Preprocess Functions
Objectives
This repository provides simple preprocess techniques for Thai sentences/phrases
Supports
The module supports Python 3.6+
Installation
pip install th-simple-preprocessor
How to Use
from th_preprocessor.preprocess import preprocess
text = '"::::: อย่างไรก็ตามนูร์ ฮิชัม อับดุลเลาะห์ 21-09-2018 https://www.malaysiakini.com/news/444015"'
words = preprocess(text)
print(words)
# อย่างไรก็ตามนูร์ ฮิชัม อับดุลเลาะห์ WSNUMBER WSNUMBER WSNUMBER WSLINK
Package reference:
th_preprocessor.preprocess.normalize_link
th_preprocessor.preprocess.normalize_at_mention
th_preprocessor.preprocess.normalize_email
th_preprocessor.preprocess.normalize_haha
th_preprocessor.preprocess.normalize_num
th_preprocessor.preprocess.normalize_phone
th_preprocessor.preprocess.normalize_accented_chars
th_preprocessor.preprocess.normalize_special_chars
th_preprocessor.preprocess.remove_hashtags
th_preprocessor.preprocess.remove_tag
th_preprocessor.preprocess.remove_dup_spaces
th_preprocessor.preprocess.remove_emoji
th_preprocessor.preprocess.replace_dup_chars
th_preprocessor.preprocess.replace_dup_emojis
th_preprocessor.preprocess.insert_spaces
th_preprocessor.preprocess.normalize_emoji
th_preprocessor.preprocess.remove_others_char
th_preprocessor.preprocess.remove_stopwords
th_preprocessor.preprocess.preprocess
Copyright
All licenses in this repository are copyrighted by their respective authors. Everything else is released under CC0. See LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for th-simple-preprocessor-0.10.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5900044680ab3681571216f043ace1b4e01e46dacb88ca55643c8293782817ba |
|
MD5 | 36eff3d66e62581ac7df541a5bb9ba98 |
|
BLAKE2b-256 | 6c530dc483089e5efec48e0f3ed57fec4b1c18c360cc37256cb5ccffbe8e6e12 |
Close
Hashes for th_simple_preprocessor-0.10.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | be0264d87aeed90612baea860cb09b2481f1127b8ff12fe655577e4e4b8ad486 |
|
MD5 | cd77ccd592261e505662887916ebbed4 |
|
BLAKE2b-256 | d73ea76ec468700d4fc2405c74d4bd733cd97b670d2c7cd6d1177add152dc975 |