Simple python library with zero additional dependencies to make your Indonesian NLP project easier.
Project description
indoNLP
Bahasa | English
indoNLP adalah library python sederhana tanpa dependency tambahan yang bertujuan untuk memudahkan proyek NLP anda.
Installasi
indoNLP dapat diinstall dengan mudah dengan menggunakan pip
:
$ pip install indoNLP
Quick Start
Mengakses Indonesian NLP Open Dataset
Mengakses Indonesian NLP Open Dataset dengan cepat dan mudah.
from indoNLP.dataset import Dataset
handler = Dataset("twitter-puisi")
data = handler.read()
# out: Data(name='main', part_of='twitter-puisi')
Mengecek kesimetrisan data, jika data bersifat simetrik maka data dapat ditabelisasi menggunakan pandas.DataFrame
.
import pandas as pd
assert data.is_table(), "Data tidak simetris, tidak dapat ditabulasi!"
df = pd.DataFrame(data.data)
df.head()
# out:
# text
# 0 Hanya karena sapa itu.\nKau tikam rasamu.\nSis...
# 1 Sedang di antrian panjang\nPada sebuah penanti...
# 2 Jika kau bukan tempat awal untuk berlabuh, mak...
# 3 Setiap waktu,\nAku masih mendengar getar dawai...
# 4 Sebait rindu yang kau bacakan\nMasih terdengar...
Preprocessing Data Teks
Menerjemahkan emoji dan mengganti kata gaul (slang words).
from indoNLP.preprocessing import emoji_to_words, replace_slang, pipeline
pipe = pipeline([emoji_to_words, replace_slang])
pipe("library yg membara 🔥")
# out: "library yang membara !api!"
Development
Setup local dev environment. indoNLP
menggunakan python-poetry
untuk packaging dan management dependencies.
$ make setup-dev
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file indoNLP-0.3.4.tar.gz
.
File metadata
- Download URL: indoNLP-0.3.4.tar.gz
- Upload date:
- Size: 117.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.13 CPython/3.8.10 Linux/5.10.16.3-microsoft-standard-WSL2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c09d48a9e2d0320c88b95ed285955f30fe2e5aee6719bf7110cab9e6491465d5 |
|
MD5 | 11ed5fd0b8cd8bf300d686eb834cc7a3 |
|
BLAKE2b-256 | 1d684d1799809d3fbf7539f998af2ff6a5a43481ee7c4380aff87ecc7e43c38d |
File details
Details for the file indoNLP-0.3.4-py3-none-any.whl
.
File metadata
- Download URL: indoNLP-0.3.4-py3-none-any.whl
- Upload date:
- Size: 121.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.13 CPython/3.8.10 Linux/5.10.16.3-microsoft-standard-WSL2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 26050978c6bfcade606545e208278fa2ba98c50d7dc3db7b686df9b98a0e730f |
|
MD5 | bfda815c9531845479901976ec9d0211 |
|
BLAKE2b-256 | 3b2eb314d4cfc8e3736e5c0ad0a5d8b876c1a1a254865e170848c0048a776880 |