Arabic NLP
Project description
Ruqia Library
This library used for Arabic NLP to process, prepare and clean the Arabic text
مكتبة مخصصة لخدمة معالجة اللغة العربية وتشمل عدد من الوظائف لتنظيف النصوص وغيرها
Install
pip install ruqia
Use
from ruqiya import ruqiya
Example: Apply a Function to Pandas Single Column
from ruqiya.ruqiya import clean_text
# Often df['text'] be Object not String, so we need to apply str
df['text']=df['text'].apply(str)
# Now apply our function
df['cleaned_text']=df['text'].apply(clean_text)
# Show the result
df['cleaned_text']
All Functions
Clean the text
clean_text
function includes all these functions:
1. remove_emails 2. remove_URLs 3. remove_mentions 4. hashtags_to_words 5. remove_punctuations 6. normalize_arabic 7. remove_diacritics 8. remove_repeating_char 9. remove_stop_words 10. remove_emojis
In other words, clean_text
includes all functions except remove_hashtags
text_cleaned1=ruqiya.clean_text(text)
print(text_cleaned1)
Remove repeating character
remove_repeating_char
function
text_cleaned2=ruqiya.remove_repeating_char(text)
print(text_cleaned2)
Remove punctuations
remove_punctuations
function
text_cleaned3=ruqiya.remove_punctuations(text)
print(text_cleaned3)
Normalize Arabic
normalize_arabic
function
text_cleaned4=ruqiya.normalize_arabic(text)
print(text_cleaned4)
Remove diacritics
remove_diacritics
function
text_cleaned5=ruqiya.remove_diacritics(text)
print(text_cleaned5)
Remove stop words
remove_stop_words
function
text_cleaned6=ruqiya.remove_stop_words(text)
print(text_cleaned6)
Remove emojis
remove_emojis
function
text_cleaned7=ruqiya.remove_emojis(text)
print(text_cleaned7)
Remove mentions
remove_mentions
function
text_cleaned8=ruqiya.remove_mentions(text)
print(text_cleaned8)
Convert any hashtags to words
hashtags_to_words
function
text_cleaned9=ruqiya.hashtags_to_words(text)
print(text_cleaned9)
Remove hashtags
remove_hashtags
function
text_cleaned10=ruqiya.remove_hashtags(text)
print(text_cleaned10)
Remove emails
remove_emails
function
text_cleaned11=ruqiya.remove_emails(text)
print(text_cleaned11)
Remove URLs
remove_URLs
function
text_cleaned12=ruqiya.remove_URLs(text)
print(text_cleaned12)
Example
from ruqiya import ruqiya
text="""
!!أهلًا وسهلًا بك 👋 في الإصدارِ الأولِ من مكتبة رقيا
هل هذه هي المرة الأولى التي تستخدم فيها المكتبة😀؟!!
معلومات التواصل
ايميل
example@email.com
الموقع
https://pypi.org/project/ruqia/
تويتر
@Ru0Sa
وسم
#معالجة_العربية
"""
print('===========clean_text===========')
text_cleaned1=ruqiya.clean_text(text)
print(text_cleaned1)
print('===========remove_repeating_char===========')
text_cleaned2=ruqiya.remove_repeating_char(text)
print(text_cleaned2)
print('===========remove_punctuations===========')
text_cleaned3=ruqiya.remove_punctuations(text)
print(text_cleaned3)
print('===========normalize_arabic===========')
text_cleaned4=ruqiya.normalize_arabic(text)
print(text_cleaned4)
print('===========remove_diacritics===========')
text_cleaned5=ruqiya.remove_diacritics(text)
print(text_cleaned5)
print('===========remove_stop_words===========')
text_cleaned6=ruqiya.remove_stop_words(text)
print(text_cleaned6)
print('===========remove_emojis===========')
text_cleaned7=ruqiya.remove_emojis(text)
print(text_cleaned7)
print('===========remove_mentions===========')
text_cleaned8=ruqiya.remove_mentions(text)
print(text_cleaned8)
print('===========hashtags_to_words===========')
text_cleaned9=ruqiya.hashtags_to_words(text)
print(text_cleaned9)
print('===========remove_hashtags===========')
text_cleaned10=ruqiya.remove_hashtags(text)
print(text_cleaned10)
print('===========remove_emails===========')
text_cleaned11=ruqiya.remove_emails(text)
print(text_cleaned11)
print('===========remove_URLs===========')
text_cleaned12=ruqiya.remove_URLs(text)
print(text_cleaned12)
Example 2: Apply a Function to Pandas DataFrame (Single Column)
from ruqiya.ruqiya import clean_text
import pandas as pd
data="https://raw.githubusercontent.com/Ruqyai/data4test/main/test_with_lables.csv"
df=pd.read_csv(data)
df['text']=df['poem_text']
#--------------------
# Often df['text'] be Object not String, so we need to apply str
df['text']=df['text'].apply(str)
# Now apply our function
df['cleaned_text']=df['text'].apply(clean_text)
#--------------------
# Show the result
df['cleaned_text']
Citing Ruqia
If Ruqia helps your research, we appreciate your citations. Here is the BibTeX entry:
@misc{Ruqia2022,
title={Ruqia-Library},
author={Ruqiya Bin Safi},
year={2022},
howpublished={\url{https://github.com/Ruqyai/Ruqia-Library}},
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file ruqia-0.0.23.tar.gz
.
File metadata
- Download URL: ruqia-0.0.23.tar.gz
- Upload date:
- Size: 19.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.19
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 55492200f54ff35cbefa2a39af4e3f8b8ea042dbad06c7a3416aa0d24d37fb97 |
|
MD5 | 5b5f7c6b526a097b733c90a9c1a433f7 |
|
BLAKE2b-256 | 82a98204f503cdf8a8314d4194f293dc119506c374597123a9fe456ce6437417 |
File details
Details for the file ruqia-0.0.23-py3-none-any.whl
.
File metadata
- Download URL: ruqia-0.0.23-py3-none-any.whl
- Upload date:
- Size: 18.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.19
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f34c0a4ee9fd130ab065f98aa91ebb71a47919882f079c8570dee7954314dbb4 |
|
MD5 | 3191503eb23a71813c06b11a79843701 |
|
BLAKE2b-256 | f408ac61ea565b427cd7b9fe0d713629825c030327dd67dcffa558c0bdb6249b |