Skip to main content

Simple python library with zero additional dependencies to make your Indonesian NLP project easier.

Project description

indoNLP

PyPI version Python Version Test Lint codecov Code style: black


Bahasa | English

indoNLP adalah library python sederhana tanpa dependency tambahan yang bertujuan untuk memudahkan proyek NLP anda.

Installasi

indoNLP dapat diinstall dengan mudah dengan menggunakan pip:

$ pip install indoNLP

Quick Start

Mengakses Indonesian NLP Open Dataset

Mengakses Indonesian NLP Open Dataset dengan cepat dan mudah.

from indoNLP.dataset import Dataset

handler = Dataset("id-multi-label-hate-speech-and-abusive-language-detection")
data = handler.read()

Jika data bersifat simetrik maka data dapat ditabelisasi menggunakan pandas.DataFrame

import pandas as pd

df = pd.DataFrame(data)

Preprocessing Data Teks

Menerjemahkan emoji dan mengganti kata gaul (slang words)

from indoNLP.preprocessing import emoji_to_words, replace_slang, pipeline

pipe = pipeline([emoji_to_words, replace_slang])
pipe("library yg membara 🔥")
# "library yang membara !api!"

Development

Setup local dev environment. indoNLP menggunakan python-poetry untuk packaging dan management dependencies.

$ make setup-dev

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

indoNLP-0.3.1.tar.gz (116.1 kB view hashes)

Uploaded Source

Built Distribution

indoNLP-0.3.1-py3-none-any.whl (120.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page