Wrapper library for data cleansing, preprocessing in text
Project description
Maleo
Wrapper library for text cleansing, preprocessing in NLP
Overview of features
- Scanner : get insight about your text dataset (ex: number of chars, words, emojis, etc)
- Remove hyperlink, punctuation, stopword, emoticon, etc
- Extract hashtags, price from text
- Convert email, phone number, date to <TAG>
- Convert Indonesian slang to formal word
- Convert emoji to word
- Convert word to number
Installation
pip install maleo
Getting Started
from maleo.wizard import Wizard
wiz = Wizard()
wiz.scanner(df, 'text')
wiz.emoji_to_word(df.text)
wiz.slang_to_formal(df.text)
Instance Attribute
['scanner',
'rm_multiple_space',
'rm_link',
'rm_punc',
'rm_char',
'rm_html',
'rm_non_ascii',
'rm_stopword',
'rm_emoticon',
'word_to_number',
'get_hashtag',
'get_price',
'email_to_tag',
'date_to_tag',
'phone_num_to_tag',
'slang_to_formal',
'emoji_to_word']
Contributor:
- Ruben Stefanus
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
maleo-0.0.2.tar.gz
(6.2 kB
view details)
File details
Details for the file maleo-0.0.2.tar.gz
.
File metadata
- Download URL: maleo-0.0.2.tar.gz
- Upload date:
- Size: 6.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.23.0 setuptools/46.4.0.post20200518 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b93ae6ac42b4d51fec463faff457da3ead92b8d8e478e801f87526e7108efa4d |
|
MD5 | cc61a206a7f673fd22bbdc953524795e |
|
BLAKE2b-256 | 52d8e0d03d901d24b336ed7764765fcf54005f50d9ce9b7de2becde6add8c5c0 |