Skip to main content

A Python package for Balinese Text Preprocessing

Project description

Package for Balinese Text Preprocessing

This is the first package to preprocess your Balinese raw texts. This package provides several functions that you can use for prepare and convert your raw text into clean version.

Installation

pip install balinese_textpreprocessor

Usage

from balinese_textpreprocessor import TextPreprocessor
sentence = "I Budi ngalahin **& I Lutunge 12354!!"
preprocessor = TextPreprocessor()
preprocessed_sentence = preprocessor.case_folding(sentence)
preprocessed_sentence = preprocessor.remove_number(preprocessed_sentence)
preprocessed_sentence = preprocessor.remove_punctuation(
    preprocessed_sentence)
preprocessed_sentence = preprocessor.normalize_words(
    preprocessed_sentence)
preprocessed_sentence = preprocessor.lemmatize_text(
    preprocessed_sentence)
print(preprocessed_sentence)

Acknowledgement

Please cite this paper if you think this package is useful:

[1] Arimbawaa, I. G. A. P., & ERa, N. A. S. (2017). Lemmatization in Balinese language. Jurnal Elektronik Ilmu Komputer Udayana p-ISSN, 2301, 5373.

[2] Pradipthaa, I. G. M. H., & ERa, N. A. S. (2020). Building balinese part-of-speech tagger using hidden markov model (HMM). Jurnal Elektronik Ilmu Komputer Udayana p-ISSN, 2301, 5373.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

balinese_textpreprocessor-2.2.1.tar.gz (60.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

balinese_textpreprocessor-2.2.1-py3-none-any.whl (58.6 kB view details)

Uploaded Python 3

File details

Details for the file balinese_textpreprocessor-2.2.1.tar.gz.

File metadata

File hashes

Hashes for balinese_textpreprocessor-2.2.1.tar.gz
Algorithm Hash digest
SHA256 96b61c5f24b5e6af1f47749b7e6371dd75cc9d89b932331a96ace4c2758daef2
MD5 734b2d4bd8164816a05f90fdac8f9d7a
BLAKE2b-256 109e69d8d265ad73e22c33de8021cc8b0b6ec0e7f89e306848184dda2975cfd5

See more details on using hashes here.

Provenance

The following attestation bundles were made for balinese_textpreprocessor-2.2.1.tar.gz:

Publisher: publish.yml on satriabimantara/balinese_textpreprocessor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file balinese_textpreprocessor-2.2.1-py3-none-any.whl.

File metadata

File hashes

Hashes for balinese_textpreprocessor-2.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b3e07ea5b4db5461baa9986f82f30755d3aabdf2a7adb5a2f4aa5381b5ba867d
MD5 35864edeade36ec4a2bfb5f4098e7516
BLAKE2b-256 07ebe4f8d4730a72ae8f9185143726dca2b85eda65153a6d354a5bdc81b9bd15

See more details on using hashes here.

Provenance

The following attestation bundles were made for balinese_textpreprocessor-2.2.1-py3-none-any.whl:

Publisher: publish.yml on satriabimantara/balinese_textpreprocessor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page