Skip to main content

A Python package for Balinese Text Preprocessing

Project description

Package for Balinese Text Preprocessing

This is the first package to preprocess your Balinese raw texts. This package provides several functions that you can use for prepare and convert your raw text into clean version.

Installation

pip install balinese_textpreprocessor

Usage

from balinese_textpreprocessor import TextPreprocessor
sentence = "I Budi ngalahin **& I Lutunge 12354!!"
preprocessor = TextPreprocessor()
preprocessed_sentence = preprocessor.case_folding(sentence)
preprocessed_sentence = preprocessor.remove_number(preprocessed_sentence)
preprocessed_sentence = preprocessor.remove_punctuation(
    preprocessed_sentence)
preprocessed_sentence = preprocessor.normalize_words(
    preprocessed_sentence)
preprocessed_sentence = preprocessor.lemmatize_text(
    preprocessed_sentence)
print(preprocessed_sentence)

Acknowledgement

Please cite this paper if you think this package is useful:

[1] Arimbawaa, I. G. A. P., & ERa, N. A. S. (2017). Lemmatization in Balinese language. Jurnal Elektronik Ilmu Komputer Udayana p-ISSN, 2301, 5373.

[2] Pradipthaa, I. G. M. H., & ERa, N. A. S. (2020). Building balinese part-of-speech tagger using hidden markov model (HMM). Jurnal Elektronik Ilmu Komputer Udayana p-ISSN, 2301, 5373.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

balinese_textpreprocessor-2.1.1.tar.gz (58.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

balinese_textpreprocessor-2.1.1-py3-none-any.whl (56.8 kB view details)

Uploaded Python 3

File details

Details for the file balinese_textpreprocessor-2.1.1.tar.gz.

File metadata

File hashes

Hashes for balinese_textpreprocessor-2.1.1.tar.gz
Algorithm Hash digest
SHA256 203228a8a792401c16bb2adb6ecc63ec8534174c90f4ba411678d5620ee2258d
MD5 c5bea014d8c3d18ee552f5c0d64b1292
BLAKE2b-256 f97aeae9f5198b69501a94fd4919e47c38865c106b47016e00df84b240682433

See more details on using hashes here.

Provenance

The following attestation bundles were made for balinese_textpreprocessor-2.1.1.tar.gz:

Publisher: publish.yml on satriabimantara/balinese_textpreprocessor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file balinese_textpreprocessor-2.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for balinese_textpreprocessor-2.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 165d1fb7077274ee97ba22f24a9e20acab35c21ecf459aa0172797874f93e2a1
MD5 94701653256abab888479ce44fcb5789
BLAKE2b-256 059d92be208441f926748919f150ea974dac5c29642c327db779f148b42a20bf

See more details on using hashes here.

Provenance

The following attestation bundles were made for balinese_textpreprocessor-2.1.1-py3-none-any.whl:

Publisher: publish.yml on satriabimantara/balinese_textpreprocessor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page