Advanced Indonesian Natural Language Processing Library
Project description
Cara Pakai nahiarhdNLP
Instalasi
pip install nahiarhdNLP
Import Library
# Import module utama
import preprocessing
import mydatasets
# Atau import fungsi spesifik
from preprocessing import preprocess, remove_html, replace_slang
Contoh Penggunaan
1. Fungsi Preprocess All-in-One
import preprocessing
# Preprocessing lengkap dengan satu fungsi
teks = "Halooo emg siapa yg nanya? 😀"
hasil = preprocessing.preprocess(teks)
print(hasil)
# Output: "halo wajah_gembira"
2. TextCleaner - Membersihkan Teks
from preprocessing import TextCleaner
cleaner = TextCleaner()
# Membersihkan URL
url_text = "kunjungi https://google.com sekarang!"
clean_result = cleaner.clean_urls(url_text)
print(clean_result)
# Output: "kunjungi sekarang!"
# Membersihkan mentions
mention_text = "Halo @user123 apa kabar?"
clean_result = cleaner.clean_mentions(mention_text)
print(clean_result)
# Output: "Halo apa kabar?"
3. SpellCorrector - Koreksi Ejaan
from preprocessing import SpellCorrector
spell = SpellCorrector()
# Koreksi kata
word = "mencri"
corrected = spell.correct(word)
print(corrected)
# Output: "mencuri"
# Koreksi kalimat
sentence = "saya mencri informsi"
corrected = spell.correct_sentence(sentence)
print(corrected)
# Output: "saya mencuri informasi"
4. StopwordRemover - Menghapus Stopwords
from preprocessing import StopwordRemover
stopword = StopwordRemover()
# Menghapus stopwords
text = "saya suka makan nasi goreng"
result = stopword.remove_stopwords(text)
print(result)
# Output: "suka makan nasi goreng"
5. SlangNormalizer - Normalisasi Slang
from preprocessing import SlangNormalizer
slang = SlangNormalizer()
# Normalisasi kata slang
text = "gw lg di rmh"
result = slang.normalize(text)
print(result)
# Output: "saya lagi di rumah"
6. EmojiConverter - Konversi Emoji
from preprocessing import EmojiConverter
emoji = EmojiConverter()
# Emoji ke teks
emoji_text = "😀 😂 😍"
text_result = emoji.emoji_to_text_convert(emoji_text)
print(text_result)
# Output: "wajah_gembira wajah_tertawa wajah_bercinta"
# Teks ke emoji
text = "wajah_gembira"
emoji_result = emoji.text_to_emoji_convert(text)
print(emoji_result)
# Output: "😀"
7. Tokenizer - Tokenisasi
from preprocessing import Tokenizer
tokenizer = Tokenizer()
# Tokenisasi teks
text = "ini contoh tokenisasi"
tokens = tokenizer.tokenize(text)
print(tokens)
# Output: ['ini', 'contoh', 'tokenisasi']
8. Fungsi Individual
from preprocessing import (
remove_html, remove_url, remove_mentions,
replace_slang, emoji_to_words, correct_spelling
)
# Menghapus HTML
html_text = "website <a href='https://google.com'>google</a>"
clean_text = remove_html(html_text)
print(clean_text)
# Output: "website google"
# Menghapus URL
url_text = "kunjungi https://google.com sekarang!"
clean_text = remove_url(url_text)
print(clean_text)
# Output: "kunjungi sekarang!"
# Menghapus mentions
mention_text = "Halo @user123 apa kabar?"
clean_text = remove_mentions(mention_text)
print(clean_text)
# Output: "Halo apa kabar?"
# Normalisasi slang
slang_text = "emg siapa yg nanya?"
normal_text = replace_slang(slang_text)
print(normal_text)
# Output: "memang siapa yang bertanya?"
# Konversi emoji
emoji_text = "😀 😂 😍"
text_result = emoji_to_words(emoji_text)
print(text_result)
# Output: "wajah_gembira wajah_tertawa wajah_bercinta"
# Koreksi ejaan
spell_text = "saya mencri informsi"
corrected = correct_spelling(spell_text)
print(corrected)
# Output: "saya mencuri informasi"
9. Dataset Loader
from mydatasets import DatasetLoader
loader = DatasetLoader()
# Load stopwords
stopwords = loader.load_stopwords_dataset()
print(f"Jumlah stopwords: {len(stopwords)}")
# Load slang dictionary
slang_dict = loader.load_slang_dataset()
print(f"Jumlah slang: {len(slang_dict)}")
# Load emoji dictionary
emoji_dict = loader.load_emoji_dataset()
print(f"Jumlah emoji: {len(emoji_dict)}")
10. Pipeline Custom
from preprocessing import pipeline, replace_word_elongation, replace_slang
# Buat pipeline custom
custom_pipeline = pipeline([
replace_word_elongation,
replace_slang
])
# Jalankan pipeline
text = "Knp emg gk mw makan kenapaaa???"
result = custom_pipeline(text)
print(result)
# Output: "mengapa memang tidak mau makan mengapa???"
Parameter Preprocess
Fungsi preprocess() memiliki parameter opsional:
result = preprocessing.preprocess(
text="Halooo emg siapa yg nanya? 😀",
remove_html_tags=True, # Hapus HTML tags
remove_urls=True, # Hapus URL
remove_stopwords_flag=True, # Hapus stopwords
replace_slang_flag=True, # Normalisasi slang
replace_elongation=True, # Atasi word elongation
convert_emoji=True, # Konversi emoji
correct_spelling_flag=False,# Koreksi ejaan (lambat)
stem_text_flag=False, # Stemming
to_lowercase=True # Lowercase
)
Error Handling
try:
import preprocessing
result = preprocessing.preprocess("test")
except ImportError:
print("Package nahiarhdNLP belum terinstall")
print("Install dengan: pip install nahiarhdNLP")
except Exception as e:
print(f"Error: {e}")
Tips Penggunaan
- Untuk preprocessing cepat: Gunakan
preprocess()dengan parameter default - Untuk kontrol penuh: Gunakan kelas individual (
TextCleaner,SpellCorrector, dll) - Untuk kustomisasi: Gunakan
pipeline()dengan fungsi yang diinginkan - Untuk koreksi ejaan: Aktifkan
correct_spelling_flag=True(tapi lebih lambat) - Untuk stemming: Aktifkan
stem_text_flag=True(perlu install Sastrawi)
Dependencies
Package ini membutuhkan:
datasets- untuk load dataset dari HuggingFacesastrawi- untuk stemming (opsional)pandas- untuk data processingrich- untuk output formattingfsspec- untuk file system operationshuggingface_hub- untuk akses HuggingFace
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
nahiarhdnlp-1.0.2.tar.gz
(556.0 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
nahiarhdnlp-1.0.2-py3-none-any.whl
(564.2 kB
view details)
File details
Details for the file nahiarhdnlp-1.0.2.tar.gz.
File metadata
- Download URL: nahiarhdnlp-1.0.2.tar.gz
- Upload date:
- Size: 556.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
32b4d5ce035ced372a7f93755c313b56f87eab82d60ac232a39cac092376d13c
|
|
| MD5 |
e697a48910f13370c4da377ad159464a
|
|
| BLAKE2b-256 |
55ea92a78345298423c4172626af1941456499543c9160e0088772465ae686cc
|
File details
Details for the file nahiarhdnlp-1.0.2-py3-none-any.whl.
File metadata
- Download URL: nahiarhdnlp-1.0.2-py3-none-any.whl
- Upload date:
- Size: 564.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ef6505a6fc1a48b5ae0e550c0f878c21298111c463ec2a71b9761c9392704a67
|
|
| MD5 |
57f1ae1a4721be62ecd4dacf867f9972
|
|
| BLAKE2b-256 |
0ddb226a8194306d8ec807c80a2a238ae6cab038ba03fb5c1bbc74823004cd5f
|