Advanced Indonesian Natural Language Processing Library

These details have not been verified by PyPI

Project links

Project description

Cara Pakai nahiarhdNLP

Instalasi

pip install nahiarhdNLP

Import Library

# Import module utama
import preprocessing
import mydatasets

# Atau import fungsi spesifik
from preprocessing import preprocess, remove_html, replace_slang

Contoh Penggunaan

1. Fungsi Preprocess All-in-One

import preprocessing

# Preprocessing lengkap dengan satu fungsi
teks = "Halooo emg siapa yg nanya? 😀"
hasil = preprocessing.preprocess(teks)
print(hasil)
# Output: "halo wajah_gembira"

2. TextCleaner - Membersihkan Teks

from preprocessing import TextCleaner

cleaner = TextCleaner()

# Membersihkan URL
url_text = "kunjungi https://google.com sekarang!"
clean_result = cleaner.clean_urls(url_text)
print(clean_result)
# Output: "kunjungi  sekarang!"

# Membersihkan mentions
mention_text = "Halo @user123 apa kabar?"
clean_result = cleaner.clean_mentions(mention_text)
print(clean_result)
# Output: "Halo  apa kabar?"

3. SpellCorrector - Koreksi Ejaan

from preprocessing import SpellCorrector

spell = SpellCorrector()

# Koreksi kata
word = "mencri"
corrected = spell.correct(word)
print(corrected)
# Output: "mencuri"

# Koreksi kalimat
sentence = "saya mencri informsi"
corrected = spell.correct_sentence(sentence)
print(corrected)
# Output: "saya mencuri informasi"

4. StopwordRemover - Menghapus Stopwords

from preprocessing import StopwordRemover

stopword = StopwordRemover()

# Menghapus stopwords
text = "saya suka makan nasi goreng"
result = stopword.remove_stopwords(text)
print(result)
# Output: "suka makan nasi goreng"

5. SlangNormalizer - Normalisasi Slang

from preprocessing import SlangNormalizer

slang = SlangNormalizer()

# Normalisasi kata slang
text = "gw lg di rmh"
result = slang.normalize(text)
print(result)
# Output: "saya lagi di rumah"

6. EmojiConverter - Konversi Emoji

from preprocessing import EmojiConverter

emoji = EmojiConverter()

# Emoji ke teks
emoji_text = "😀 😂 😍"
text_result = emoji.emoji_to_text_convert(emoji_text)
print(text_result)
# Output: "wajah_gembira wajah_tertawa wajah_bercinta"

# Teks ke emoji
text = "wajah_gembira"
emoji_result = emoji.text_to_emoji_convert(text)
print(emoji_result)
# Output: "😀"

7. Tokenizer - Tokenisasi

from preprocessing import Tokenizer

tokenizer = Tokenizer()

# Tokenisasi teks
text = "ini contoh tokenisasi"
tokens = tokenizer.tokenize(text)
print(tokens)
# Output: ['ini', 'contoh', 'tokenisasi']

8. Fungsi Individual

from preprocessing import (
    remove_html, remove_url, remove_mentions,
    replace_slang, emoji_to_words, correct_spelling
)

# Menghapus HTML
html_text = "website <a href='https://google.com'>google</a>"
clean_text = remove_html(html_text)
print(clean_text)
# Output: "website google"

# Menghapus URL
url_text = "kunjungi https://google.com sekarang!"
clean_text = remove_url(url_text)
print(clean_text)
# Output: "kunjungi  sekarang!"

# Menghapus mentions
mention_text = "Halo @user123 apa kabar?"
clean_text = remove_mentions(mention_text)
print(clean_text)
# Output: "Halo  apa kabar?"

# Normalisasi slang
slang_text = "emg siapa yg nanya?"
normal_text = replace_slang(slang_text)
print(normal_text)
# Output: "memang siapa yang bertanya?"

# Konversi emoji
emoji_text = "😀 😂 😍"
text_result = emoji_to_words(emoji_text)
print(text_result)
# Output: "wajah_gembira wajah_tertawa wajah_bercinta"

# Koreksi ejaan
spell_text = "saya mencri informsi"
corrected = correct_spelling(spell_text)
print(corrected)
# Output: "saya mencuri informasi"

9. Dataset Loader

from mydatasets import DatasetLoader

loader = DatasetLoader()

# Load stopwords
stopwords = loader.load_stopwords_dataset()
print(f"Jumlah stopwords: {len(stopwords)}")

# Load slang dictionary
slang_dict = loader.load_slang_dataset()
print(f"Jumlah slang: {len(slang_dict)}")

# Load emoji dictionary
emoji_dict = loader.load_emoji_dataset()
print(f"Jumlah emoji: {len(emoji_dict)}")

10. Pipeline Custom

from preprocessing import pipeline, replace_word_elongation, replace_slang

# Buat pipeline custom
custom_pipeline = pipeline([
    replace_word_elongation,
    replace_slang
])

# Jalankan pipeline
text = "Knp emg gk mw makan kenapaaa???"
result = custom_pipeline(text)
print(result)
# Output: "mengapa memang tidak mau makan mengapa???"

Parameter Preprocess

Fungsi preprocess() memiliki parameter opsional:

result = preprocessing.preprocess(
    text="Halooo emg siapa yg nanya? 😀",
    remove_html_tags=True,      # Hapus HTML tags
    remove_urls=True,           # Hapus URL
    remove_stopwords_flag=True, # Hapus stopwords
    replace_slang_flag=True,    # Normalisasi slang
    replace_elongation=True,    # Atasi word elongation
    convert_emoji=True,         # Konversi emoji
    correct_spelling_flag=False,# Koreksi ejaan (lambat)
    stem_text_flag=False,       # Stemming
    to_lowercase=True           # Lowercase
)

Error Handling

try:
    import preprocessing
    result = preprocessing.preprocess("test")
except ImportError:
    print("Package nahiarhdNLP belum terinstall")
    print("Install dengan: pip install nahiarhdNLP")
except Exception as e:
    print(f"Error: {e}")

Tips Penggunaan

Untuk preprocessing cepat: Gunakan preprocess() dengan parameter default
Untuk kontrol penuh: Gunakan kelas individual (TextCleaner, SpellCorrector, dll)
Untuk kustomisasi: Gunakan pipeline() dengan fungsi yang diinginkan
Untuk koreksi ejaan: Aktifkan correct_spelling_flag=True (tapi lebih lambat)
Untuk stemming: Aktifkan stem_text_flag=True (perlu install Sastrawi)

Dependencies

Package ini membutuhkan:

datasets - untuk load dataset dari HuggingFace
sastrawi - untuk stemming (opsional)
pandas - untuk data processing
rich - untuk output formatting
fsspec - untuk file system operations
huggingface_hub - untuk akses HuggingFace

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.5.3

Jan 8, 2026

1.5.2

Jan 8, 2026

1.5.1

Dec 17, 2025

1.5

Dec 17, 2025

1.4.11

Dec 17, 2025

1.4.10

Dec 17, 2025

1.4.9

Dec 16, 2025

1.4.8

Dec 16, 2025

1.4.6

Sep 3, 2025

1.4.5

Sep 2, 2025

1.4.4

Sep 2, 2025

1.4.3

Sep 2, 2025

1.4.2

Sep 2, 2025

1.4.1

Sep 2, 2025

1.4.0

Sep 2, 2025

1.3.2

Jul 28, 2025

1.3.1

Jul 28, 2025

1.2.6

Jul 28, 2025

1.2.5

Jul 25, 2025

1.2.4

Jul 24, 2025

1.2.3

Jul 24, 2025

1.2.2

Jul 24, 2025

1.2.1

Jul 24, 2025

1.2.0

Jul 24, 2025

1.1.1

Jul 24, 2025

1.1.0

Jul 24, 2025

1.0.7

Jul 18, 2025

1.0.6

Jul 18, 2025

1.0.5

Jul 18, 2025

1.0.4

Jul 18, 2025

1.0.3

Jul 18, 2025

This version

1.0.2

Jul 18, 2025

1.0.1

Jul 17, 2025

1.0.0

Jul 17, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nahiarhdnlp-1.0.2.tar.gz (556.0 kB view details)

Uploaded Jul 18, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

nahiarhdnlp-1.0.2-py3-none-any.whl (564.2 kB view details)

Uploaded Jul 18, 2025 Python 3

File details

Details for the file nahiarhdnlp-1.0.2.tar.gz.

File metadata

Download URL: nahiarhdnlp-1.0.2.tar.gz
Upload date: Jul 18, 2025
Size: 556.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.6

File hashes

Hashes for nahiarhdnlp-1.0.2.tar.gz
Algorithm	Hash digest
SHA256	`32b4d5ce035ced372a7f93755c313b56f87eab82d60ac232a39cac092376d13c`
MD5	`e697a48910f13370c4da377ad159464a`
BLAKE2b-256	`55ea92a78345298423c4172626af1941456499543c9160e0088772465ae686cc`

See more details on using hashes here.

File details

Details for the file nahiarhdnlp-1.0.2-py3-none-any.whl.

File metadata

Download URL: nahiarhdnlp-1.0.2-py3-none-any.whl
Upload date: Jul 18, 2025
Size: 564.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.6

File hashes

Hashes for nahiarhdnlp-1.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ef6505a6fc1a48b5ae0e550c0f878c21298111c463ec2a71b9761c9392704a67`
MD5	`57f1ae1a4721be62ecd4dacf867f9972`
BLAKE2b-256	`0ddb226a8194306d8ec807c80a2a238ae6cab038ba03fb5c1bbc74823004cd5f`

See more details on using hashes here.

nahiarhdNLP 1.0.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Cara Pakai nahiarhdNLP

Instalasi

Import Library

Contoh Penggunaan

1. Fungsi Preprocess All-in-One

2. TextCleaner - Membersihkan Teks

3. SpellCorrector - Koreksi Ejaan

4. StopwordRemover - Menghapus Stopwords

5. SlangNormalizer - Normalisasi Slang

6. EmojiConverter - Konversi Emoji

7. Tokenizer - Tokenisasi

8. Fungsi Individual

9. Dataset Loader

10. Pipeline Custom

Parameter Preprocess

Error Handling

Tips Penggunaan

Dependencies

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes