Skip to main content

Lightweight NLP preprocessing package for Arabic language

Project description

fathah

Lightweight NLP preprocessing package for Arabic language

Installation

pip install fathah

Usage

from Fathah import TextClean

Methods

Clean the text

clean_text function includes all these functions:

 1. remove_emails  
 2. remove_URLs  
 3. remove_mentions   
 4. hashtags_to_words     
 5. remove_punctuations  
 6. normalize_arabic   
 7. remove_diacritics   
 8. remove_repeating_char   
 9. remove_stop_words   
 10. remove_emojis

In other words, clean_text includes all functions except remove_hashtags


text_cleaned1 = TextClean.clean_text(text)

print(text_cleaned1)

Remove repeating character

remove_repeating_char function


text_cleaned2 = TextClean.remove_repeating_char(text)

print(text_cleaned2)

Remove punctuations

remove_punctuations function


text_cleaned3 = TextClean.remove_punctuations(text)

print(text_cleaned3)

Normalize Arabic

normalize_arabic function


text_cleaned4 = TextClean.normalize_arabic(text)

print(text_cleaned4)

Remove diacritics

remove_diacritics function


text_cleaned5= TextClean.remove_diacritics(text)

print(text_cleaned5)

Remove stop words

remove_stop_words function


text_cleaned6 = TextClean.remove_stop_words(text)

print(text_cleaned6)

Remove emojis

remove_emojis function


text_cleaned7 = TextClean.remove_emojis(text)

print(text_cleaned7)

Remove mentions

remove_mentions function


text_cleaned8 = TextClean.remove_mentions(text)

print(text_cleaned8)

Convert any hashtags to words

hashtags_to_words function


text_cleaned9 = TextClean.hashtags_to_words(text)

print(text_cleaned9)

Remove hashtags

remove_hashtags function


text_cleaned10 = TextClean.remove_hashtags(text)

print(text_cleaned10)

Remove emails

remove_emails function


text_cleaned11 = TextClean.remove_emails(text)

print(text_cleaned11)

Remove URLs

remove_URLs function


text_cleaned12 = TextClean.remove_URLs(text)

print(text_cleaned12)

Example

from fathah import TextClean



cleaner = TextClean(text)

cleaner.remove_diacritics()



# Outputs: السلام عليكم ورحمة الله وبركاته

This package is under development. Contributions are highly welcome

Github | IG

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fathah-0.0.2.tar.gz (10.7 kB view details)

Uploaded Source

Built Distribution

fathah-0.0.2-py3-none-any.whl (9.5 kB view details)

Uploaded Python 3

File details

Details for the file fathah-0.0.2.tar.gz.

File metadata

  • Download URL: fathah-0.0.2.tar.gz
  • Upload date:
  • Size: 10.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.5

File hashes

Hashes for fathah-0.0.2.tar.gz
Algorithm Hash digest
SHA256 c0a4e56cb44d0b6456e0885eaad3990913900b4bbc09fd509b67655a9e4397c2
MD5 d79384d81725f3b47a0761f75d37960d
BLAKE2b-256 6783ae299c84346b5bf62a2f285bf098d07a077b85f193b3218c05c73c51f3b8

See more details on using hashes here.

File details

Details for the file fathah-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: fathah-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 9.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.5

File hashes

Hashes for fathah-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 9ef6c0e02f13396e510b707c8d8da36769b30dda87bb35d207e4d06da21fa96f
MD5 2f4083f3ac2b549b6c5b9176176c35c2
BLAKE2b-256 5e41f553b8d235813779c47c9d5b2267990ab186cafab55ceed36651c21eeef5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page