Advanced Indonesian Natural Language Processing Library

These details have not been verified by PyPI

Project links

Project description

nahiarhdNLP

nahiarhdNLP is an advanced Python library for Indonesian Natural Language Processing (NLP), providing easy-to-use tools for text preprocessing, normalization, tokenization, stemming, spell correction, and customizable pipelines.

Installation

pip install nahiarhdNLP

Features

Preprocessing: Clean text from HTML, URLs, stopwords, slang, emoji, mentions, hashtags, numbers, punctuation, extra spaces, and special characters.
Tokenization: Split sentences into tokens/words.
Stemming: Convert words to their root form (using Sastrawi).
Spell Correction: Automatic spelling correction.
Pipeline: Chain multiple preprocessing functions easily.
Normalization: Replace slang, emoji, and informal words with formal equivalents.

Quick Usage Example

Basic Preprocessing

from nahiarhdNLP import preprocessing

text = "Halooo emg siapa yg nanya? 😀 <a href='#'>link</a> @user #trending 123"
cleaned = preprocessing.cleaning.text_cleaner.clean_text(text)
print(cleaned)

Custom Preprocessing Pipeline

from nahiarhdNLP.preprocessing import (
    pipeline, remove_html, remove_url, remove_mentions, remove_hashtags,
    remove_numbers, replace_word_elongation, emoji_to_words, replace_slang,
    remove_stopwords, remove_punctuation, remove_extra_spaces, to_lowercase
)

custom_pipe = pipeline([
    remove_html, remove_url, remove_mentions, remove_hashtags, remove_numbers,
    replace_word_elongation, emoji_to_words, replace_slang, remove_stopwords,
    remove_punctuation, remove_extra_spaces, to_lowercase
])

result = custom_pipe("Halooo emg siapa yg nanya? 😀 <a href='#'>link</a> @user #trending 123")
print(result)

Spell Correction

from nahiarhdNLP.preprocessing import correct_spelling
print(correct_spelling("sya suka mkn nasi"))  # "saya suka makan nasi"

Stemming

from nahiarhdNLP.preprocessing import stem_text
print(stem_text("bermain-main dengan senang"))  # "main dengan senang"

Requirements

Python 3.7+
pandas, fsspec, huggingface_hub, sastrawi, datasets, rich

Testing

pytest tests/

Directory Structure

nahiarhdNLP/
├── main.py
├── requirements.txt
├── README.md
├── src/
│   ├── preprocessing/
│   ├── mydatasets/
└── tests/

Contribution

Contributions are welcome! Please fork the repository, create a new branch, and submit a pull request.

License

MIT License

Acknowledgments

Stopwords dataset from HuggingFace
Emoji dataset from HuggingFace
Slang dataset from HuggingFace
Sastrawi for Indonesian stemming

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.5.3

Jan 8, 2026

1.5.2

Jan 8, 2026

1.5.1

Dec 17, 2025

1.5

Dec 17, 2025

1.4.11

Dec 17, 2025

1.4.10

Dec 17, 2025

1.4.9

Dec 16, 2025

1.4.8

Dec 16, 2025

1.4.6

Sep 3, 2025

1.4.5

Sep 2, 2025

1.4.4

Sep 2, 2025

1.4.3

Sep 2, 2025

1.4.2

Sep 2, 2025

1.4.1

Sep 2, 2025

1.4.0

Sep 2, 2025

1.3.2

Jul 28, 2025

1.3.1

Jul 28, 2025

1.2.6

Jul 28, 2025

1.2.5

Jul 25, 2025

1.2.4

Jul 24, 2025

1.2.3

Jul 24, 2025

1.2.2

Jul 24, 2025

1.2.1

Jul 24, 2025

1.2.0

Jul 24, 2025

1.1.1

Jul 24, 2025

1.1.0

Jul 24, 2025

1.0.7

Jul 18, 2025

1.0.6

Jul 18, 2025

1.0.5

Jul 18, 2025

1.0.4

Jul 18, 2025

1.0.3

Jul 18, 2025

1.0.2

Jul 18, 2025

This version

1.0.1

Jul 17, 2025

1.0.0

Jul 17, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nahiarhdnlp-1.0.1.tar.gz (555.1 kB view details)

Uploaded Jul 17, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

nahiarhdnlp-1.0.1-py3-none-any.whl (563.4 kB view details)

Uploaded Jul 17, 2025 Python 3

File details

Details for the file nahiarhdnlp-1.0.1.tar.gz.

File metadata

Download URL: nahiarhdnlp-1.0.1.tar.gz
Upload date: Jul 17, 2025
Size: 555.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.6

File hashes

Hashes for nahiarhdnlp-1.0.1.tar.gz
Algorithm	Hash digest
SHA256	`1e637ab001ac5c5ce7c865734fe4484d7da3761f768950f441efbf2d79ff8de9`
MD5	`46954a4ad82670ce516487db426249db`
BLAKE2b-256	`d986afddc908ca1396880c3086e8860d4ebeaaab5a69599ecfdfdfef7149be4b`

See more details on using hashes here.

File details

Details for the file nahiarhdnlp-1.0.1-py3-none-any.whl.

File metadata

Download URL: nahiarhdnlp-1.0.1-py3-none-any.whl
Upload date: Jul 17, 2025
Size: 563.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.6

File hashes

Hashes for nahiarhdnlp-1.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e1f4e9e31d5ddb86ffd00049e82e1f29c2d23f1f07fc6fe04a2e65a04bd377e5`
MD5	`f9bf48684609d378b75cf68dd5c77cbf`
BLAKE2b-256	`f4538bb90233900b5937286f1ae2b9d3e9604253fb20645976106e2bca288328`

See more details on using hashes here.

nahiarhdNLP 1.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

nahiarhdNLP

Installation

Features

Quick Usage Example

Basic Preprocessing

Custom Preprocessing Pipeline

Spell Correction

Stemming

Requirements

Testing

Directory Structure

Contribution

License

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes