Skip to main content

A Python package for Balinese Text Preprocessing

Project description

Package for Balinese Text Preprocessing

This is the first package to preprocess your Balinese raw texts. This package provides several functions that you can use for prepare and convert your raw text into clean version.

Installation

pip install balinese_textpreprocessor

Usage

from balinese_textpreprocessor import TextPreprocessor
sentence = "I Budi ngalahin **& I Lutunge 12354!!"
preprocessor = TextPreprocessor()
preprocessed_sentence = preprocessor.case_folding(sentence)
preprocessed_sentence = preprocessor.remove_number(preprocessed_sentence)
preprocessed_sentence = preprocessor.remove_punctuation(
    preprocessed_sentence)
preprocessed_sentence = preprocessor.normalize_words(
    preprocessed_sentence)
preprocessed_sentence = preprocessor.lemmatize_text(
    preprocessed_sentence)
print(preprocessed_sentence)

Acknowledgement

Please cite this paper if you think this package is useful:

[1] Arimbawaa, I. G. A. P., & ERa, N. A. S. (2017). Lemmatization in Balinese language. Jurnal Elektronik Ilmu Komputer Udayana p-ISSN, 2301, 5373.

[2] Pradipthaa, I. G. M. H., & ERa, N. A. S. (2020). Building balinese part-of-speech tagger using hidden markov model (HMM). Jurnal Elektronik Ilmu Komputer Udayana p-ISSN, 2301, 5373.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

balinese_textpreprocessor-2.2.0.tar.gz (58.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

balinese_textpreprocessor-2.2.0-py3-none-any.whl (57.2 kB view details)

Uploaded Python 3

File details

Details for the file balinese_textpreprocessor-2.2.0.tar.gz.

File metadata

File hashes

Hashes for balinese_textpreprocessor-2.2.0.tar.gz
Algorithm Hash digest
SHA256 9e32a03da53e3379691342e94653207aba0f05e85d35ad1e1b14887486aca1e9
MD5 6cb193ed6da1a54a2af9d086b80e6ff8
BLAKE2b-256 ff529eac75c203b824dbb9b61a2c579983e95c18e93791684eea9b518daec202

See more details on using hashes here.

Provenance

The following attestation bundles were made for balinese_textpreprocessor-2.2.0.tar.gz:

Publisher: publish.yml on satriabimantara/balinese_textpreprocessor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file balinese_textpreprocessor-2.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for balinese_textpreprocessor-2.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7b579945025dad55aa191591148a524a6cbdb7f81ed298ba281b473df4c0b52f
MD5 f22b121c1c26f9549a871351f44800e0
BLAKE2b-256 512a8b55e9ab6f87d6a7124fd75c82bad2870a8a640c306382443f221fc9a876

See more details on using hashes here.

Provenance

The following attestation bundles were made for balinese_textpreprocessor-2.2.0-py3-none-any.whl:

Publisher: publish.yml on satriabimantara/balinese_textpreprocessor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page