Skip to main content

quarnic nlp

Project description

QuranicTools: A Python NLP Library for Quranic NLP

Open In Colab
Part of Speech Tagging | Dependency Parsing | Lemmatizer | Multilingual Search
| Quranic Extractions | Revelation Order |
Embeddings (coming soon) | Translations

Quranic NLP

Quranic NLP is a computational toolbox to conduct various syntactic and semantic analyses of Quranic verses. The aim is to put together all available resources contributing to a better understanding/analysis of the Quran for everyone.

Contents:

Installation

Step 1 — Install the package

pip install quranic-nlp

Step 2 — Download the data

The library requires data files (~97MB) that are downloaded separately from GitHub Releases:

quranic_data

Or from Python:

from quranic_nlp.data_requirements import download_data
download_data()

Data is downloaded once and stored inside the package directory automatically.

Development Setup

To set up a local development environment:

git clone https://github.com/language-ml/hadith-quranic_nlp.git
cd hadith-quranic_nlp
python -m venv .venv
source .venv/bin/activate   # On Windows: .venv\Scripts\activate
pip install -e .
quranic_data

Pipeline

The NLP pipeline contains morphological information e.g., Lemmatizer as well as POS Tagger and Dependancy Parser in a Spacy-like pipeline.

from quranic_nlp import language

translation_translator = 'fa#1'
pips = 'dep,pos,root,lemma'
nlp = language.Pipeline(pips, translation_translator)

Doc object has different extensions. First, there are sentences in doc referring to the verses. Second, there are ayah in doc which is indicate number ayeh in soure. Third, there are surah in doc which is indicate name of soure. Fourth, there are revelation_order in doc which is indicate order of revelation of the ayeh. doc which is the list of Token also has its own extensions. The pips is information to use from quranic_nlp. The translation_translator is language for translate quran such that language (fa) or language along with # along with number books. For see all translate run below code

from quranic_nlp import utils
utils.print_all_translations()

Quranic NLP has its own spacy extensions. If related pipeline is not called, that extension cannot be used.

Format Inputs

There are three ways to format the input:

  1. number surah along with # along with number ayah.
  2. name surah along with # along with number ayah.
  3. search text in quran.

Note The last two calls require access to the internet for an API call.

from quranic_nlp import language

translation_translator = 'fa#1'
pips = 'dep,pos,root,lemma'
nlp = language.Pipeline(pips, translation_translator)

doc = nlp('1#1')
doc = nlp('حمد#1')
doc = nlp('رب العالمین')

Example

Two examples are provided below to demonstrate the usage of the library:

First, Displaying data from Surah Al-Fatiha, Verse 1:

first_doc = nlp('1#1')

Second, Displaying data from Surah Aal-i-Imran, Verse 200:

second_doc = nlp('3#200')

We have two functional sections that can be used with any input:

  1. Verse Information: This section provides detailed information about a specific verse in the Quran. The information related to a verse is structured as follows:

    • Verse Text and Meaning: The text of the verse is provided along with its meaning or translation.
    • Similar Verses: Similar verses are included, following the format Surah#Verse, along with the name of the verse. These verses share similarities in content or theme.
    • Verse Order: The order of the verse within the Surah is mentioned. Revelation Order: The chronological order of the revelation of the verse is specified.

First, Displaying data from Surah Al-Fatiha, Verse 1:

print(first_doc)
بِسْمِ اللَّهِ الرَّحْمَنِ الرَّحِیمِ 
print(first_doc._.text)
بِسْمِ اللَّهِ الرَّحْمَـٰنِ الرَّحِيمِ
print(first_doc._.surah)
فاتحه
print(first_doc._.ayah)
1
print(first_doc._.revelation_order)
63
print(first_doc._.translations)
ستايش خدا را كه پروردگار جهانيان است.
print(first_doc._.sim_ayahs)
['27#30', '1#3', '55#1', '41#2', '2#163', '59#22', '11#41', '12#92', '7#151', '24#20', '44#42', '6#118', '36#5', '26#191', '26#175', '26#159', '26#140', '26#122', '26#104', '26#68', '26#9', '26#217', '20#5', '19#88', '21#83', '24#10', '25#60', '15#49', '12#64', '19#93', '2#192', '4#96', '4#106', '30#5', '32#6', '9#99', '6#121', '78#37', '19#91', '22#34', '2#218', '19#85', '41#32', '22#28', '12#98', '19#96', '20#8', '19#18', '5#4', '2#64', '39#53', '22#40', '5#74', '3#157', '10#58', '52#28', '22#36', '19#78', '43#84', '50#33', '6#119', '5#98', '17#110', '19#87', '21#26', '9#27', '36#58', '49#10', '26#5', '16#18', '9#104', '7#180', '6#138', '3#129', '23#118', '7#49', '67#29', '20#109', '27#46', '19#92', '43#36', '67#28', '25#59', '19#69', '2#37', '21#112', '43#20', '24#14', '69#52', '56#96', '56#74', '11#73', '3#132', '24#5', '3#89', '42#5', '43#45', '36#15', '57#28', '48#14']

Second, Displaying data from Surah Aal-i-Imran, Verse 200:

print(second_doc)
يَا أَیُّهَا الَّذِينَ  آمَن اِصْبِر وَ صَابِر وَ رَابِط وَ اِتَّق اللَّهَ لَعَلَّکُمْ تُفْلِحُو اُو اُو اُو اُو اُو نَ 
print(second_doc._.text)
يَا أَيُّهَا الَّذِينَ آمَنُوا اصْبِرُوا وَصَابِرُوا وَرَابِطُوا وَاتَّقُوا اللَّهَ لَعَلَّكُمْ تُفْلِحُونَ
print(second_doc._.surah)
آل عمران
print(second_doc._.ayah)
200
print(second_doc._.revelation_order)
89
print(second_doc._.translations)
اى كسانى كه ايمان آوردهايد، شكيبا باشيد و ديگران را به شكيبايى فراخوانيد و در جنگها پايدارى كنيد و از خدا بترسيد، باشد كه رستگار شويد.
print(second_doc._.sim_ayahs)
['3#130', '5#35', '9#119', '33#70', '2#189', '8#45', '59#18', '22#77', '3#102', '3#123', '26#179', '26#163', '26#150', '26#144', '26#131', '26#126', '26#110', '26#108', '2#153', '49#1', '49#10', '2#278', '70#5', '5#90', '33#41', '61#2', '15#69', '57#28', '5#57', '7#69', '5#100', '2#183', '8#29', '5#11', '58#9', '52#16', '47#7', '8#20', '74#7', '5#8', '47#33', '8#27', '8#15', '7#87', '10#63', '61#10', '3#149', '3#100', '9#123', '33#69', '2#104', '2#172', '4#71', '16#127', '23#1', '5#87', '33#56', '62#10', '65#10', '49#12', '4#144', '63#9', '24#27', '2#208', '58#11', '64#14', '5#93', '5#105', '33#9', '27#53', '41#18', '5#88', '9#23', '61#14', '29#59', '16#42', '8#46', '60#13', '8#24', '4#59', '62#9', '49#6', '4#136', '5#51', '4#1', '5#2', '58#12', '4#29', '66#6', '5#94', '33#1', '2#254', '8#69', '87#14', '91#9', '33#49', '5#1', '2#21', '64#16', '9#34']
  1. Word Information: In this section, you will find information specifically related to the words within a particular verse. The information related to a verse is structured as follows:
    • Word Text: The actual text of the word.
    • Tag: The part-of-speech tag that describes the word's grammatical category.
    • Dependency: The dependency relationship of the word within the sentence.
    • Lemma: The base or dictionary form of the word.
    • Root: The root form of the word, which captures its core meaning.
    • Head: The head word to which the current word is dependent.
    • Arc Dep: The arc dependency label that represents the grammatical relationship between the head word and the current word.
    • Rel: The semantic or syntactic relationship between the head word and the current word. First, Displaying data from Surah Al-Fatiha, Verse 1: I will show third word in verse.
word=first_doc[2]
print(word)
اللَّهِ
print(word.dep_)
نعت
print(word.head)
رَّحِیمِ
print(word.lemma_)
ٱللَّه
print(word.pos_)
from quranic_nlp import utils
print(utils.POS_UNI_FA[word.pos_])
NOUN
اسم
print(word._.dep_arc)
LTR
print(word._.root)
اله

Second, Displaying data from Surah Aal-i-Imran, Verse 200: I will show sixth word in verse.

word=second_doc[5]
print(word)
اِصْبِر
print(word.dep_)
الف زینت
print(word.head)
ا
print(word.lemma_)
صَبَرَ
print(word.pos_)
from quranic_nlp import utils
print(utils.POS_UNI_FA[word.pos_])
VERB
فعل
print(word._.dep_arc)
LTR

LTR : Left to Right RTL : Right to Left

print(word._.root)
صبر

At the end, to jsonify the results you can use the following:

dictionary = language.to_json(pips, doc)
print(dictionary)
[{'id': 1, 'text': بِ, 'root': '', 'lemma': '', 'pos': 'INTJ', 'rel': 'مجرور', 'arc': 'LTR', 'head': سْمِ}, {'id': 2, 'text': سْمِ, 'root': 'سمو', 'lemma': 'ٱسْم', 'pos': 'NOUN', 'rel': 'مضاف الیه ', 'arc': 'LTR', 'head': اللَّهِ}, {'id': 3, 'text': اللَّهِ, 'root': 'اله', 'lemma': 'ٱللَّه', 'pos': 'NOUN', 'rel': 'نعت', 'arc': 'LTR', 'head': رَّحِیمِ}, {'id': 4, 'text': ال, 'root': '', 'lemma': '', 'pos': 'INTJ', 'rel': 'تعریف', 'arc': 'RTL', 'head': رَّحْمَنِ}, {'id': 5, 'text': رَّحْمَنِ, 'root': 'رحم', 'lemma': 'رَّحْمَٰن', 'pos': 'NOUN', 'rel': '', 'arc': None, 'head': رَّحْمَنِ}, {'id': 6, 'text': ال, 'root': '', 'lemma': '', 'pos': 'INTJ', 'rel': 'تعریف', 'arc': 'RTL', 'head': رَّحِیمِ}, {'id': 7, 'text': رَّحِیمِ, 'root': 'رحم', 'lemma': 'رَّحِيم', 'pos': 'NOUN', 'rel': '', 'arc': None, 'head': رَّحِیمِ}]

To show the results you can use the following:

from spacy import displacy
displacy.serve(doc, style="dep")
options = {"compact": True, "bg": "#09a3d5",
           "color": "white", "font": "xb-niloofar"}
displacy.serve(doc, style="dep", options=options)

Contributors

  • Seyyed Mohammad Aref Jahanmir
  • Alireza Sahebi
  • Doratossadat Dastgheyb
  • Erfan Mohammadi
  • Mahdi Ahmadi
  • Ehsaneddin Asgari

📧 Contact: asgari [dot] berkeley [dot] edu

Contributing

We warmly welcome contributions from the community! Whether you are a researcher, developer, linguist, or simply passionate about the Quran and NLP, there are many ways to get involved:

Area How to Help
New features New pipeline components, morphological analyses, or language support
Data quality Corrections to POS tags, dependency parses, lemmas, or roots
Translations Add or improve Quranic translations for underrepresented languages
Testing Help increase test coverage
Bug reports Open an issue if something doesn't work as expected
Documentation Clearer examples, tutorials, or API docs

To contribute, fork the repository, make your changes, and open a pull request. For larger changes, please open an issue first to discuss your idea.

We believe open collaboration leads to better tools for everyone. Every contribution, big or small, is valued and appreciated.

Bibles

import pickle

with open("./data_utils/bibles.pickle", "rb") as handle:
    bibles_dict = pickle.load(handle)
print(len(bibles_dict))
print(list(bibles_dict.keys())[0:10])
print(bibles_dict['01001009'])
38555
['01001001', '01001002', '01001003', '01001004', '01001005', '01001006', '01001007', '01001008', '01001009', '01001010']
{'arb-x-bible-1993': 'وقال الله لتجتمع المياه تحت السماء الى مكان واحد ولتظهر اليابسة . وكان كذلك .', 'eng-x-bible-amplified': 'Then God said : “ Let the waters under the heavens be collected together into one place , and let the dry land appear . ” And it was so .', 'grc-x-bible-accented': 'και ειπεν ο θεος συναχθητω το υδωρ το υποκατω του ουρανου εις συναγωγην μιαν και οφθητω η ξηρα και εγενετο ουτως και συνηχθη το υδωρ το υποκατω του ουρανου εις τας συναγωγας αυτων και ωφθη η ξηρα'}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

quranic_nlp-1.2.6.tar.gz (55.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

quranic_nlp-1.2.6-py3-none-any.whl (22.3 kB view details)

Uploaded Python 3

File details

Details for the file quranic_nlp-1.2.6.tar.gz.

File metadata

  • Download URL: quranic_nlp-1.2.6.tar.gz
  • Upload date:
  • Size: 55.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for quranic_nlp-1.2.6.tar.gz
Algorithm Hash digest
SHA256 d1bbd491e50e4711532dfe4ea5ce9913fa72daeacd2a7cbfb0a43d7c4a51cdca
MD5 eaf9fae030a8e24c6dd8a165bcdada39
BLAKE2b-256 44557f9e7f77bb63e9dbf50b322067e926ec81deea3755cf141952716e9cf40e

See more details on using hashes here.

File details

Details for the file quranic_nlp-1.2.6-py3-none-any.whl.

File metadata

  • Download URL: quranic_nlp-1.2.6-py3-none-any.whl
  • Upload date:
  • Size: 22.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for quranic_nlp-1.2.6-py3-none-any.whl
Algorithm Hash digest
SHA256 c2967e41ef9d34a01ac6575edafab30770fdcd5ec5bacaa82d34d90febf5c5ac
MD5 6a3fc81cd27d3a934885e39ce68c5203
BLAKE2b-256 54f4faafca1c1362183ce5eb5c6ebf72881397a0fbb877c20664bec4d2380b4c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page