Arabic NLP
Project description
Ruqia lib
This library used for Arabic NLP to process, prepare and clean the Arabic text
مكتبة مخصصة لخدمة معالجة اللغة العربية وتشمل عدد من الوظائف لتنظيف النصوص وغيرها
Install
pip install ruqia
Use
from ruqiya import ruqiya
All Functions
Clean the text
clean_text
function includes all these functions:
1. remove_emails 2. remove_URLs 3. remove_mentions 4. hashtags_to_words 5. remove_punctuations 6. normalize_arabic 7. remove_diacritics 8. remove_repeating_char 9. remove_stop_words 10. remove_emojis
In other words, clean_text
includes all functions except remove_hashtags
text_cleaned1=ruqiya.clean_text(text)
print(text_cleaned1)
Remove repeating character
remove_repeating_char
function
text_cleaned2=ruqiya.remove_repeating_char(text)
print(text_cleaned2)
Remove punctuations
remove_punctuations
function
text_cleaned3=ruqiya.remove_punctuations(text)
print(text_cleaned3)
Normalize Arabic
normalize_arabic
function
text_cleaned4=ruqiya.normalize_arabic(text)
print(text_cleaned4)
Remove diacritics
remove_diacritics
function
text_cleaned5=ruqiya.remove_diacritics(text)
print(text_cleaned5)
Remove stop words
remove_stop_words
function
text_cleaned6=ruqiya.remove_stop_words(text)
print(text_cleaned6)
Remove emojis
remove_emojis
function
text_cleaned7=ruqiya.remove_emojis(text)
print(text_cleaned7)
Remove mentions
remove_mentions
function
text_cleaned8=ruqiya.remove_mentions(text)
print(text_cleaned8)
Convert any hashtags to words
hashtags_to_words
function
text_cleaned9=ruqiya.hashtags_to_words(text)
print(text_cleaned9)
Remove hashtags
remove_hashtags
function
text_cleaned10=ruqiya.remove_hashtags(text)
print(text_cleaned10)
Remove emails
remove_emails
function
text_cleaned11=ruqiya.remove_emails(text)
print(text_cleaned11)
Remove URLs
remove_URLs
function
text_cleaned12=ruqiya.remove_URLs(text)
print(text_cleaned12)
Example
from ruqiya import ruqiya
text="""
!!أهلًا وسهلًا بك 👋 في الإصدارِ الأولِ من مكتبة رقيا
هل هذه هي المرة الأولى التي تستخدم فيها المكتبة😀؟!!
معلومات التواصل
ايميل
example@email.com
الموقع
https://pypi.org/project/ruqia/
تويتر
@Ru0Sa
وسم
#معالجة_العربية
"""
print('===========clean_text===========')
text_cleaned1=ruqiya.clean_text(text)
print(text_cleaned1)
print('===========remove_repeating_char===========')
text_cleaned2=ruqiya.remove_repeating_char(text)
print(text_cleaned2)
print('===========remove_punctuations===========')
text_cleaned3=ruqiya.remove_punctuations(text)
print(text_cleaned3)
print('===========normalize_arabic===========')
text_cleaned4=ruqiya.normalize_arabic(text)
print(text_cleaned4)
print('===========remove_diacritics===========')
text_cleaned5=ruqiya.remove_diacritics(text)
print(text_cleaned5)
print('===========remove_stop_words===========')
text_cleaned6=ruqiya.remove_stop_words(text)
print(text_cleaned6)
print('===========remove_emojis===========')
text_cleaned7=ruqiya.remove_emojis(text)
print(text_cleaned7)
print('===========remove_mentions===========')
text_cleaned8=ruqiya.remove_mentions(text)
print(text_cleaned8)
print('===========hashtags_to_words===========')
text_cleaned9=ruqiya.hashtags_to_words(text)
print(text_cleaned9)
print('===========remove_hashtags===========')
text_cleaned10=ruqiya.remove_hashtags(text)
print(text_cleaned10)
print('===========remove_emails===========')
text_cleaned11=ruqiya.remove_emails(text)
print(text_cleaned11)
print('===========remove_URLs===========')
text_cleaned12=ruqiya.remove_URLs(text)
print(text_cleaned12)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
ruqia-0.0.16.tar.gz
(13.2 kB
view hashes)
Built Distribution
ruqia-0.0.16-py3-none-any.whl
(9.8 kB
view hashes)