Skip to main content

Simple python library with zero additional dependencies to make your Indonesian NLP project easier.

Project description

indoNLP

PyPI version Python Version Test Lint codecov Code style: black


Bahasa | English

indoNLP adalah library python sederhana tanpa dependency tambahan yang bertujuan untuk memudahkan proyek NLP anda.

Installasi

indoNLP dapat diinstall dengan mudah dengan menggunakan pip:

$ pip install indoNLP

Quick Start

Mengakses Indonesian NLP Open Dataset

Mengakses Indonesian NLP Open Dataset dengan cepat dan mudah.

from indoNLP.dataset import Dataset

handler = Dataset("twitter-puisi")
data = handler.read()
# out: Data(name='main', part_of='twitter-puisi')

Mengecek kesimetrisan data, jika data bersifat simetrik maka data dapat ditabelisasi menggunakan pandas.DataFrame.

import pandas as pd

assert data.is_table(), "Data tidak simetris, tidak dapat ditabulasi!"
df = pd.DataFrame(data.data)
df.head()
# out:
#                                                 text
# 0  Hanya karena sapa itu.\nKau tikam rasamu.\nSis...
# 1  Sedang di antrian panjang\nPada sebuah penanti...
# 2  Jika kau bukan tempat awal untuk berlabuh, mak...
# 3  Setiap waktu,\nAku masih mendengar getar dawai...
# 4  Sebait rindu yang kau bacakan\nMasih terdengar...

Preprocessing Data Teks

Menerjemahkan emoji dan mengganti kata gaul (slang words).

from indoNLP.preprocessing import emoji_to_words, replace_slang, pipeline

pipe = pipeline([emoji_to_words, replace_slang])
pipe("library yg membara 🔥")
# out: "library yang membara !api!"

Development

Setup local dev environment. indoNLP menggunakan python-poetry untuk packaging dan management dependencies.

$ make setup-dev

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

indoNLP-0.3.4.tar.gz (117.3 kB view details)

Uploaded Source

Built Distribution

indoNLP-0.3.4-py3-none-any.whl (121.9 kB view details)

Uploaded Python 3

File details

Details for the file indoNLP-0.3.4.tar.gz.

File metadata

  • Download URL: indoNLP-0.3.4.tar.gz
  • Upload date:
  • Size: 117.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.8.10 Linux/5.10.16.3-microsoft-standard-WSL2

File hashes

Hashes for indoNLP-0.3.4.tar.gz
Algorithm Hash digest
SHA256 c09d48a9e2d0320c88b95ed285955f30fe2e5aee6719bf7110cab9e6491465d5
MD5 11ed5fd0b8cd8bf300d686eb834cc7a3
BLAKE2b-256 1d684d1799809d3fbf7539f998af2ff6a5a43481ee7c4380aff87ecc7e43c38d

See more details on using hashes here.

File details

Details for the file indoNLP-0.3.4-py3-none-any.whl.

File metadata

  • Download URL: indoNLP-0.3.4-py3-none-any.whl
  • Upload date:
  • Size: 121.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.8.10 Linux/5.10.16.3-microsoft-standard-WSL2

File hashes

Hashes for indoNLP-0.3.4-py3-none-any.whl
Algorithm Hash digest
SHA256 26050978c6bfcade606545e208278fa2ba98c50d7dc3db7b686df9b98a0e730f
MD5 bfda815c9531845479901976ec9d0211
BLAKE2b-256 3b2eb314d4cfc8e3736e5c0ad0a5d8b876c1a1a254865e170848c0048a776880

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page