Lightweight NLP preprocessing package for Arabic language
Project description
fathah
Lightweight NLP preprocessing package for Arabic language
Installation
pip install fathah
Usage
from Fathah import TextClean
Methods
Clean the text
clean_text
function includes all these functions:
1. remove_emails
2. remove_URLs
3. remove_mentions
4. hashtags_to_words
5. remove_punctuations
6. normalize_arabic
7. remove_diacritics
8. remove_repeating_char
9. remove_stop_words
10. remove_emojis
In other words, clean_text
includes all functions except remove_hashtags
text_cleaned1 = TextClean.clean_text(text)
print(text_cleaned1)
Remove repeating character
remove_repeating_char
function
text_cleaned2 = TextClean.remove_repeating_char(text)
print(text_cleaned2)
Remove punctuations
remove_punctuations
function
text_cleaned3 = TextClean.remove_punctuations(text)
print(text_cleaned3)
Normalize Arabic
normalize_arabic
function
text_cleaned4 = TextClean.normalize_arabic(text)
print(text_cleaned4)
Remove diacritics
remove_diacritics
function
text_cleaned5= TextClean.remove_diacritics(text)
print(text_cleaned5)
Remove stop words
remove_stop_words
function
text_cleaned6 = TextClean.remove_stop_words(text)
print(text_cleaned6)
Remove emojis
remove_emojis
function
text_cleaned7 = TextClean.remove_emojis(text)
print(text_cleaned7)
Remove mentions
remove_mentions
function
text_cleaned8 = TextClean.remove_mentions(text)
print(text_cleaned8)
Convert any hashtags to words
hashtags_to_words
function
text_cleaned9 = TextClean.hashtags_to_words(text)
print(text_cleaned9)
Remove hashtags
remove_hashtags
function
text_cleaned10 = TextClean.remove_hashtags(text)
print(text_cleaned10)
Remove emails
remove_emails
function
text_cleaned11 = TextClean.remove_emails(text)
print(text_cleaned11)
Remove URLs
remove_URLs
function
text_cleaned12 = TextClean.remove_URLs(text)
print(text_cleaned12)
Example
from fathah import TextClean
cleaner = TextClean(text)
cleaner.remove_diacritics()
# Outputs: السلام عليكم ورحمة الله وبركاته
This package is under development. Contributions are highly welcome
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file fathah-0.0.2.tar.gz
.
File metadata
- Download URL: fathah-0.0.2.tar.gz
- Upload date:
- Size: 10.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c0a4e56cb44d0b6456e0885eaad3990913900b4bbc09fd509b67655a9e4397c2 |
|
MD5 | d79384d81725f3b47a0761f75d37960d |
|
BLAKE2b-256 | 6783ae299c84346b5bf62a2f285bf098d07a077b85f193b3218c05c73c51f3b8 |
File details
Details for the file fathah-0.0.2-py3-none-any.whl
.
File metadata
- Download URL: fathah-0.0.2-py3-none-any.whl
- Upload date:
- Size: 9.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9ef6c0e02f13396e510b707c8d8da36769b30dda87bb35d207e4d06da21fa96f |
|
MD5 | 2f4083f3ac2b549b6c5b9176176c35c2 |
|
BLAKE2b-256 | 5e41f553b8d235813779c47c9d5b2267990ab186cafab55ceed36651c21eeef5 |