Skip to main content

Arabic NLP

Project description

Ruqia Library

This library used for Arabic NLP to process, prepare and clean the Arabic text

مكتبة مخصصة لخدمة معالجة اللغة العربية وتشمل عدد من الوظائف لتنظيف النصوص وغيرها

Install

pip install ruqia

Use

from ruqiya import ruqiya

Example: Apply a Function to Pandas Single Column

from ruqiya.ruqiya import clean_text

# Often df['text'] be Object not String, so we need to apply str 
df['text']=df['text'].apply(str)
# Now apply our function
df['cleaned_text']=df['text'].apply(clean_text)
# Show the result
df['cleaned_text']

All Functions

Clean the text

clean_text function includes all these functions:

 1. remove_emails  
 2. remove_URLs  
 3. remove_mentions   
 4. hashtags_to_words     
 5. remove_punctuations  
 6. normalize_arabic   
 7. remove_diacritics   
 8. remove_repeating_char   
 9. remove_stop_words   
 10. remove_emojis

In other words, clean_text includes all functions except remove_hashtags

text_cleaned1=ruqiya.clean_text(text)
print(text_cleaned1)

Remove repeating character

remove_repeating_char function

text_cleaned2=ruqiya.remove_repeating_char(text)
print(text_cleaned2)

Remove punctuations

remove_punctuations function

text_cleaned3=ruqiya.remove_punctuations(text)
print(text_cleaned3)

Normalize Arabic

normalize_arabic function

text_cleaned4=ruqiya.normalize_arabic(text)
print(text_cleaned4)

Remove diacritics

remove_diacritics function

text_cleaned5=ruqiya.remove_diacritics(text)
print(text_cleaned5)

Remove stop words

remove_stop_words function

text_cleaned6=ruqiya.remove_stop_words(text)
print(text_cleaned6)

Remove emojis

remove_emojis function

text_cleaned7=ruqiya.remove_emojis(text)
print(text_cleaned7)

Remove mentions

remove_mentions function

text_cleaned8=ruqiya.remove_mentions(text)
print(text_cleaned8)

Convert any hashtags to words

hashtags_to_words function

text_cleaned9=ruqiya.hashtags_to_words(text)
print(text_cleaned9)

Remove hashtags

remove_hashtags function

text_cleaned10=ruqiya.remove_hashtags(text)
print(text_cleaned10)

Remove emails

remove_emails function

text_cleaned11=ruqiya.remove_emails(text)
print(text_cleaned11)

Remove URLs

remove_URLs function

text_cleaned12=ruqiya.remove_URLs(text)
print(text_cleaned12)

Example

from ruqiya import ruqiya

text="""
!!أهلًا وسهلًا بك 👋 في الإصدارِ الأولِ من مكتبة رقيا
هل هذه هي المرة الأولى التي تستخدم فيها المكتبة😀؟!!
معلومات التواصل 
ايميل
example@email.com
الموقع
https://pypi.org/project/ruqia/
تويتر
@Ru0Sa
وسم
#معالجة_العربية
"""

print('===========clean_text===========')
text_cleaned1=ruqiya.clean_text(text)
print(text_cleaned1)
print('===========remove_repeating_char===========')
text_cleaned2=ruqiya.remove_repeating_char(text)
print(text_cleaned2)
print('===========remove_punctuations===========')
text_cleaned3=ruqiya.remove_punctuations(text)
print(text_cleaned3)
print('===========normalize_arabic===========')
text_cleaned4=ruqiya.normalize_arabic(text)
print(text_cleaned4)
print('===========remove_diacritics===========')
text_cleaned5=ruqiya.remove_diacritics(text)
print(text_cleaned5)
print('===========remove_stop_words===========')
text_cleaned6=ruqiya.remove_stop_words(text)
print(text_cleaned6)
print('===========remove_emojis===========')
text_cleaned7=ruqiya.remove_emojis(text)
print(text_cleaned7)
print('===========remove_mentions===========')
text_cleaned8=ruqiya.remove_mentions(text)
print(text_cleaned8)
print('===========hashtags_to_words===========')
text_cleaned9=ruqiya.hashtags_to_words(text)
print(text_cleaned9)
print('===========remove_hashtags===========')
text_cleaned10=ruqiya.remove_hashtags(text)
print(text_cleaned10)
print('===========remove_emails===========')
text_cleaned11=ruqiya.remove_emails(text)
print(text_cleaned11)
print('===========remove_URLs===========')
text_cleaned12=ruqiya.remove_URLs(text)
print(text_cleaned12)

Example 2: Apply a Function to Pandas DataFrame (Single Column)

from ruqiya.ruqiya import clean_text
import pandas as pd

data="https://raw.githubusercontent.com/Ruqyai/data4test/main/test_with_lables.csv"
df=pd.read_csv(data)
df['text']=df['poem_text']

#--------------------
# Often df['text'] be Object not String, so we need to apply str 
df['text']=df['text'].apply(str)
# Now apply our function
df['cleaned_text']=df['text'].apply(clean_text)
#--------------------

# Show the result
df['cleaned_text']

Citing Ruqia

If Ruqia helps your research, we appreciate your citations. Here is the BibTeX entry:

@misc{Ruqia2022,
  title={Ruqia-Library},
  author={Ruqiya Bin Safi},
  year={2022},
  howpublished={\url{https://github.com/Ruqyai/Ruqia-Library}},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ruqia-0.0.23.tar.gz (19.3 kB view details)

Uploaded Source

Built Distribution

ruqia-0.0.23-py3-none-any.whl (18.2 kB view details)

Uploaded Python 3

File details

Details for the file ruqia-0.0.23.tar.gz.

File metadata

  • Download URL: ruqia-0.0.23.tar.gz
  • Upload date:
  • Size: 19.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.19

File hashes

Hashes for ruqia-0.0.23.tar.gz
Algorithm Hash digest
SHA256 55492200f54ff35cbefa2a39af4e3f8b8ea042dbad06c7a3416aa0d24d37fb97
MD5 5b5f7c6b526a097b733c90a9c1a433f7
BLAKE2b-256 82a98204f503cdf8a8314d4194f293dc119506c374597123a9fe456ce6437417

See more details on using hashes here.

File details

Details for the file ruqia-0.0.23-py3-none-any.whl.

File metadata

  • Download URL: ruqia-0.0.23-py3-none-any.whl
  • Upload date:
  • Size: 18.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.19

File hashes

Hashes for ruqia-0.0.23-py3-none-any.whl
Algorithm Hash digest
SHA256 f34c0a4ee9fd130ab065f98aa91ebb71a47919882f079c8570dee7954314dbb4
MD5 3191503eb23a71813c06b11a79843701
BLAKE2b-256 f408ac61ea565b427cd7b9fe0d713629825c030327dd67dcffa558c0bdb6249b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page