Skip to main content

quarnic nlp

Project description

QuranicTools: A Python NLP Library for Quranic NLP

Open In Colab
Part of Speech Tagging | Dependency Parsing | Lemmatizer | Multilingual Search
| Quranic Extractions | Revelation Order |
Embeddings (coming soon) | Translations

Quranic NLP

Quranic NLP is a computational toolbox to conduct various syntactic and semantic analyses of Quranic verses. The aim is to put together all available resources contributing to a better understanding/analysis of the Quran for everyone.

Contents:

Installation

Step 1 — Install the package

pip install quranic-nlp

Step 2 — Download the data

The library requires data files (~97MB) that are downloaded separately from GitHub Releases:

quranic_data

Or from Python:

from quranic_nlp.data_requirements import download_data
download_data()

Data is downloaded once and stored inside the package directory automatically.

Development Setup

To set up a local development environment:

git clone https://github.com/language-ml/hadith-quranic_nlp.git
cd hadith-quranic_nlp
python -m venv .venv
source .venv/bin/activate   # On Windows: .venv\Scripts\activate
pip install -e .
quranic_data

Pipeline

The NLP pipeline contains morphological information e.g., Lemmatizer as well as POS Tagger and Dependancy Parser in a Spacy-like pipeline.

from quranic_nlp import language

translation_translator = 'fa#1'
pips = 'dep,pos,root,lem'
nlp = language.Pipeline(pips, translation_translator)

Doc object has different extensions. First, there are sentences in doc referring to the verses. Second, there are ayah in doc which is indicate number ayeh in soure. Third, there are surah in doc which is indicate name of soure. Fourth, there are revelation_order in doc which is indicate order of revelation of the ayeh. doc which is the list of Token also has its own extensions. The pips is information to use from quranic_nlp. The translation_translator is language for translate quran such that language (fa) or language along with # along with number books. For see all translate run below code

from quranic_nlp import utils
utils.print_all_translations()

Quranic NLP has its own spacy extensions. If related pipeline is not called, that extension cannot be used.

Format Inputs

There are three ways to format the input:

  1. number surah along with # along with number ayah.
  2. name surah along with # along with number ayah.
  3. search text in quran.

Note The last two calls require access to the internet for an API call.

from quranic_nlp import language

translation_translator = 'fa#1'
pips = 'dep,pos,root,lem'
nlp = language.Pipeline(pips, translation_translator)

doc = nlp('1#1')
doc = nlp('حمد#1')
doc = nlp('رب العالمین')

Example

Two examples are provided below to demonstrate the usage of the library:

First, Displaying data from Surah Al-Fatiha, Verse 1:

first_doc = nlp('1#1')

Second, Displaying data from Surah Aal-i-Imran, Verse 200:

second_doc = nlp('3#200')

We have two functional sections that can be used with any input:

  1. Verse Information: This section provides detailed information about a specific verse in the Quran. The information related to a verse is structured as follows:

    • Verse Text and Meaning: The text of the verse is provided along with its meaning or translation.
    • Similar Verses: Similar verses are included, following the format Surah#Verse, along with the name of the verse. These verses share similarities in content or theme.
    • Verse Order: The order of the verse within the Surah is mentioned. Revelation Order: The chronological order of the revelation of the verse is specified.

First, Displaying data from Surah Al-Fatiha, Verse 1:

print(first_doc)
بِسْمِ اللَّهِ الرَّحْمَنِ الرَّحِیمِ 
print(first_doc._.text)
بِسْمِ اللَّهِ الرَّحْمَـٰنِ الرَّحِيمِ
print(first_doc._.surah)
فاتحه
print(first_doc._.ayah)
1
print(first_doc._.revelation_order)
63
print(first_doc._.translations)
ستايش خدا را كه پروردگار جهانيان است.

To get all translators for a language at once, omit the #index — translations are returned as a dict keyed by translator name:

nlp_fa_all = language.Pipeline(pips, 'fa')
doc = nlp_fa_all('1#2')
print(doc._.translations)
{'ansarian': 'همه ستایش ها، ویژه خدا، مالک و مربّی جهانیان است.',
 'ayati': 'ستايش خدا را كه پروردگار جهانيان است.',
 'bahrampour': 'ستايش خداى را كه پروردگار جهانيان است',
 ...}
print(first_doc._.sim_ayahs[:5])
[('27#30', 0.7509), ('1#3', 0.5335), ('55#1', 0.3165), ('41#2', 0.2968), ('2#163', 0.2780)]

Second, Displaying data from Surah Aal-i-Imran, Verse 200:

print(second_doc)
يَا أَیُّهَا الَّذِينَ  آمَن اِصْبِر وَ صَابِر وَ رَابِط وَ اِتَّق اللَّهَ لَعَلَّکُمْ تُفْلِحُو اُو اُو اُو اُو اُو نَ 
print(second_doc._.text)
يَا أَيُّهَا الَّذِينَ آمَنُوا اصْبِرُوا وَصَابِرُوا وَرَابِطُوا وَاتَّقُوا اللَّهَ لَعَلَّكُمْ تُفْلِحُونَ
print(second_doc._.surah)
آل عمران
print(second_doc._.ayah)
200
print(second_doc._.revelation_order)
89
print(second_doc._.translations)
اى كسانى كه ايمان آوردهايد، شكيبا باشيد و ديگران را به شكيبايى فراخوانيد و در جنگها پايدارى كنيد و از خدا بترسيد، باشد كه رستگار شويد.
print(second_doc._.sim_ayahs[:5])
[('3#130', 0.8921), ('5#35', 0.8134), ('9#119', 0.7645), ('33#70', 0.7201), ('2#189', 0.6983)]
  1. Word Information: In this section, you will find information specifically related to the words within a particular verse. The information related to a verse is structured as follows:
    • Word Text: The actual text of the word.
    • Tag: The part-of-speech tag that describes the word's grammatical category.
    • Dependency: The dependency relationship of the word within the sentence.
    • Lemma: The base or dictionary form of the word.
    • Root: The root form of the word, which captures its core meaning.
    • Head: The head word to which the current word is dependent.
    • Arc Dep: The arc dependency label that represents the grammatical relationship between the head word and the current word.
    • Rel: The semantic or syntactic relationship between the head word and the current word. First, Displaying data from Surah Al-Fatiha, Verse 1: I will show third word in verse.
word=first_doc[2]
print(word)
اللَّهِ
print(word.dep_)
نعت
print(word.head)
رَّحِیمِ
print(word.lemma_)
ٱللَّه
print(word.pos_)
from quranic_nlp import constant
print(constant.POS_UNI_FA[word.pos_])
NOUN
اسم
print(word._.dep_arc)
LTR
print(word._.root)
اله

Second, Displaying data from Surah Aal-i-Imran, Verse 200: I will show sixth word in verse.

word=second_doc[5]
print(word)
اِصْبِر
print(word.dep_)
الف زینت
print(word.head)
ا
print(word.lemma_)
صَبَرَ
print(word.pos_)
from quranic_nlp import constant
print(constant.POS_UNI_FA[word.pos_])
VERB
فعل
print(word._.dep_arc)
LTR

LTR : Left to Right RTL : Right to Left

print(word._.root)
صبر

Multiple Matches

When a free-text query matches multiple verses, use search_all to get all of them:

docs = language.search_all(nlp, 'رب العالمین', max_results=5)
for doc in docs:
    print(doc._.surah, doc._.ayah, doc._.text)
فاتحه 2 الْحَمْدُ لِلَّهِ رَبِّ الْعَالَمِينَ
مائده 28 لَئِن بَسَطتَ إِلَيَّ يَدَكَ...
انعام 45 فَقُطِعَ دَابِرُ الْقَوْمِ...

At the end, to jsonify the results you can use the following:

dictionary = language.to_json(pips, doc)
print(dictionary)
[{'id': 1, 'text': بِ, 'root': '', 'lemma': '', 'pos': 'INTJ', 'rel': 'مجرور', 'arc': 'LTR', 'head': سْمِ}, {'id': 2, 'text': سْمِ, 'root': 'سمو', 'lemma': 'ٱسْم', 'pos': 'NOUN', 'rel': 'مضاف الیه ', 'arc': 'LTR', 'head': اللَّهِ}, {'id': 3, 'text': اللَّهِ, 'root': 'اله', 'lemma': 'ٱللَّه', 'pos': 'NOUN', 'rel': 'نعت', 'arc': 'LTR', 'head': رَّحِیمِ}, {'id': 4, 'text': ال, 'root': '', 'lemma': '', 'pos': 'INTJ', 'rel': 'تعریف', 'arc': 'RTL', 'head': رَّحْمَنِ}, {'id': 5, 'text': رَّحْمَنِ, 'root': 'رحم', 'lemma': 'رَّحْمَٰن', 'pos': 'NOUN', 'rel': '', 'arc': None, 'head': رَّحْمَنِ}, {'id': 6, 'text': ال, 'root': '', 'lemma': '', 'pos': 'INTJ', 'rel': 'تعریف', 'arc': 'RTL', 'head': رَّحِیمِ}, {'id': 7, 'text': رَّحِیمِ, 'root': 'رحم', 'lemma': 'رَّحِيم', 'pos': 'NOUN', 'rel': '', 'arc': None, 'head': رَّحِیمِ}]

To show the results you can use the following:

from spacy import displacy
displacy.serve(doc, style="dep")
options = {"compact": True, "bg": "#09a3d5",
           "color": "white", "font": "xb-niloofar"}
displacy.serve(doc, style="dep", options=options)

Contributors

  • Seyyed Mohammad Aref Jahanmir
  • Alireza Sahebi
  • Doratossadat Dastgheyb
  • Erfan Mohammadi
  • Mahdi Ahmadi
  • Ehsaneddin Asgari

📧 Contact: asgari [dot] berkeley [dot] edu

Contributing

We warmly welcome contributions from the community! Whether you are a researcher, developer, linguist, or simply passionate about the Quran and NLP, there are many ways to get involved:

Area How to Help
New features New pipeline components, morphological analyses, or language support
Data quality Corrections to POS tags, dependency parses, lemmas, or roots
Translations Add or improve Quranic translations for underrepresented languages
Testing Help increase test coverage
Bug reports Open an issue if something doesn't work as expected
Documentation Clearer examples, tutorials, or API docs

To contribute, fork the repository, make your changes, and open a pull request. For larger changes, please open an issue first to discuss your idea.

We believe open collaboration leads to better tools for everyone. Every contribution, big or small, is valued and appreciated.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

quranic_nlp-1.3.2.tar.gz (55.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

quranic_nlp-1.3.2-py3-none-any.whl (22.1 kB view details)

Uploaded Python 3

File details

Details for the file quranic_nlp-1.3.2.tar.gz.

File metadata

  • Download URL: quranic_nlp-1.3.2.tar.gz
  • Upload date:
  • Size: 55.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for quranic_nlp-1.3.2.tar.gz
Algorithm Hash digest
SHA256 68958c5e554f31694f0de3865d0c0cc7508be08bb7e74f2273b271e94c4a1054
MD5 34ee6b403b7e8c6967a95577f372a454
BLAKE2b-256 95b84de2d51fae55eda7e1067aba0d8346316492f2eaf461ff5bd85999c69e0c

See more details on using hashes here.

File details

Details for the file quranic_nlp-1.3.2-py3-none-any.whl.

File metadata

  • Download URL: quranic_nlp-1.3.2-py3-none-any.whl
  • Upload date:
  • Size: 22.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for quranic_nlp-1.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 b274f6be2fa3aafe3a037a7e60266bfcae85e97aabb7b39a024b72aa4465fd46
MD5 e10c4f8d679a96876ed2655f9e2bcaea
BLAKE2b-256 92632dd2951cb93a82eef31519c1880a2572376b050a3f454f14a32472c27c1c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page