Simple python library with zero additional dependencies to make your Indonesian NLP project easier.
Project description
indoNLP
Bahasa | English
indoNLP adalah library python sederhana tanpa dependency tambahan yang bertujuan untuk memudahkan proyek NLP anda.
Installasi
indoNLP dapat diinstall dengan mudah dengan menggunakan pip:
$ pip install indoNLP
Quick Start
Mengakses Indonesian NLP Open Dataset
Mengakses Indonesian NLP Open Dataset dengan cepat dan mudah.
from indoNLP.dataset import Dataset
handler = Dataset("twitter-puisi")
data = handler.read()
# out: Data(name='main', part_of='twitter-puisi')
Mengecek kesimetrisan data, jika data bersifat simetrik maka data dapat ditabelisasi menggunakan pandas.DataFrame.
import pandas as pd
assert data.is_table(), "Data tidak simetris, tidak dapat ditabulasi!"
df = pd.DataFrame(data.data)
df.head()
# out:
# text
# 0 Hanya karena sapa itu.\nKau tikam rasamu.\nSis...
# 1 Sedang di antrian panjang\nPada sebuah penanti...
# 2 Jika kau bukan tempat awal untuk berlabuh, mak...
# 3 Setiap waktu,\nAku masih mendengar getar dawai...
# 4 Sebait rindu yang kau bacakan\nMasih terdengar...
Preprocessing Data Teks
Menerjemahkan emoji dan mengganti kata gaul (slang words).
from indoNLP.preprocessing import emoji_to_words, replace_slang, pipeline
pipe = pipeline([emoji_to_words, replace_slang])
pipe("library yg membara 🔥")
# out: "library yang membara !api!"
Development
Setup local dev environment. indoNLP menggunakan python-poetry
untuk packaging dan management dependencies.
$ make setup-dev
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file indoNLP-0.3.4.tar.gz.
File metadata
- Download URL: indoNLP-0.3.4.tar.gz
- Upload date:
- Size: 117.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.13 CPython/3.8.10 Linux/5.10.16.3-microsoft-standard-WSL2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c09d48a9e2d0320c88b95ed285955f30fe2e5aee6719bf7110cab9e6491465d5
|
|
| MD5 |
11ed5fd0b8cd8bf300d686eb834cc7a3
|
|
| BLAKE2b-256 |
1d684d1799809d3fbf7539f998af2ff6a5a43481ee7c4380aff87ecc7e43c38d
|
File details
Details for the file indoNLP-0.3.4-py3-none-any.whl.
File metadata
- Download URL: indoNLP-0.3.4-py3-none-any.whl
- Upload date:
- Size: 121.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.13 CPython/3.8.10 Linux/5.10.16.3-microsoft-standard-WSL2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
26050978c6bfcade606545e208278fa2ba98c50d7dc3db7b686df9b98a0e730f
|
|
| MD5 |
bfda815c9531845479901976ec9d0211
|
|
| BLAKE2b-256 |
3b2eb314d4cfc8e3736e5c0ad0a5d8b876c1a1a254865e170848c0048a776880
|