Skip to main content

A Python package for Balinese Text Preprocessing

Project description

Package for Balinese Text Preprocessing

This is the first package to preprocess your Balinese raw texts. This package provides several functions that you can use for prepare and convert your raw text into clean version.

Installation

pip install balinese_textpreprocessor

Usage

from balinese_textpreprocessor import TextPreprocessor
sentence = "I Budi ngalahin **& I Lutunge 12354!!"
preprocessor = TextPreprocessor()
preprocessed_sentence = preprocessor.case_folding(sentence)
preprocessed_sentence = preprocessor.remove_number(preprocessed_sentence)
preprocessed_sentence = preprocessor.remove_punctuation(
    preprocessed_sentence)
preprocessed_sentence = preprocessor.normalize_words(
    preprocessed_sentence)
preprocessed_sentence = preprocessor.lemmatize_text(
    preprocessed_sentence)
print(preprocessed_sentence)

Acknowledgement

Please cite this paper if you think this package is useful:

[1] Arimbawaa, I. G. A. P., & ERa, N. A. S. (2017). Lemmatization in Balinese language. Jurnal Elektronik Ilmu Komputer Udayana p-ISSN, 2301, 5373.

[2] Pradipthaa, I. G. M. H., & ERa, N. A. S. (2020). Building balinese part-of-speech tagger using hidden markov model (HMM). Jurnal Elektronik Ilmu Komputer Udayana p-ISSN, 2301, 5373.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

balinese_textpreprocessor-1.1.0.tar.gz (55.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

balinese_textpreprocessor-1.1.0-py3-none-any.whl (53.5 kB view details)

Uploaded Python 3

File details

Details for the file balinese_textpreprocessor-1.1.0.tar.gz.

File metadata

File hashes

Hashes for balinese_textpreprocessor-1.1.0.tar.gz
Algorithm Hash digest
SHA256 aca4e970f2117962d8f34cb0be3b5c1e8c4dae2e1ddbc0ab4427a08179d29c67
MD5 77812c059f32f3c314e40756b19b1f00
BLAKE2b-256 b04d46af3ec39393693ff3bec28978d9ede2fdc24cb121533bcbbd137130146f

See more details on using hashes here.

Provenance

The following attestation bundles were made for balinese_textpreprocessor-1.1.0.tar.gz:

Publisher: publish.yml on satriabimantara/balinese_textpreprocessor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file balinese_textpreprocessor-1.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for balinese_textpreprocessor-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d97ef9ae90d347d8c469ea7a8127d7212295e3e9c7b4a3cd134898f0585fd063
MD5 33a49546a39a8fd1a4affc477b9d2520
BLAKE2b-256 41eea41d0cfaf29d64532846ccf1b9a3ea37ee5056ffb7ef7da8bb4835941e31

See more details on using hashes here.

Provenance

The following attestation bundles were made for balinese_textpreprocessor-1.1.0-py3-none-any.whl:

Publisher: publish.yml on satriabimantara/balinese_textpreprocessor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page