Wrapper library for text cleansing, preprocessing in NLP
Project description
Maleo
Wrapper library for text cleansing, preprocessing and POS Tagging in NLP
Docs
https://jakartaresearch.github.io/maleo/
Overview of features
- Scanner : get insight about your text dataset (ex: number of chars, words, emojis, etc)
- Remove hyperlink, punctuation, stopword, emoticon, etc
- Extract hashtags, price from text
- Convert email, phone number, date to <TAG>
- Convert Indonesian slang to formal word
- Convert emoji to word or <TAG>
- Convert word to number
- Predict Part-of-Speech (POS) tags
Installation
pip install maleo
Getting Started
from maleo.wizard import Wizard
from maleo.pos_tag import POS
wiz = Wizard()
pos = POS()
wiz.scanner(df, 'text')
wiz.emoji_to_word(df.text)
wiz.slang_to_formal(df.text)
pos.predict('saya mau pergi beli makan siang dulu', output_pair=False)
Universal POS tags
https://universaldependencies.org/u/pos/index.html
Contributor:
- Ruben Stefanus
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
maleo-0.0.7.0.tar.gz
(96.2 kB
view details)
File details
Details for the file maleo-0.0.7.0.tar.gz
.
File metadata
- Download URL: maleo-0.0.7.0.tar.gz
- Upload date:
- Size: 96.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | eb6f37c873ff8c6da8b0147c0f7e8c5a82854fb3028d5b8a2bc8015b9f9c0284 |
|
MD5 | e395eb810eb70848786834119e4599f6 |
|
BLAKE2b-256 | 87d68eeb8dde4d387bec488fe7f14482d42c7baad97966a3ef05cba64cf71e24 |