Skip to main content

quarnic nlp

Project description

QuranicTools: A Python NLP Library for Quranic NLP

Open In Colab
Part of Speech Tagging | Dependency Parsing | Lemmatizer | Multilingual Search
| Quranic Extractions | Revelation Order |
Embeddings (coming soon) | Translations

Quranic NLP

Quranic NLP is a computational toolbox to conduct various syntactic and semantic analyses of Quranic verses. The aim is to put together all available resources contributing to a better understanding/analysis of the Quran for everyone.

Contents:

Installation

Step 1 — Install the package

pip install quranic-nlp

Step 2 — Download the data

The library requires data files (~97MB) that are downloaded separately from GitHub Releases:

quranic_data

Or from Python:

from quranic_nlp.data_requirements import download_data
download_data()

Data is downloaded once and stored inside the package directory automatically.

Development Setup

To set up a local development environment:

git clone https://github.com/language-ml/hadith-quranic_nlp.git
cd hadith-quranic_nlp
python -m venv .venv
source .venv/bin/activate   # On Windows: .venv\Scripts\activate
pip install -e .
quranic_data

Pipeline

The NLP pipeline contains morphological information e.g., Lemmatizer as well as POS Tagger and Dependancy Parser in a Spacy-like pipeline.

from quranic_nlp import language

translation_translator = 'fa#1'
pips = 'dep,pos,root,lemma'
nlp = language.Pipeline(pips, translation_translator)

Doc object has different extensions. First, there are sentences in doc referring to the verses. Second, there are ayah in doc which is indicate number ayeh in soure. Third, there are surah in doc which is indicate name of soure. Fourth, there are revelation_order in doc which is indicate order of revelation of the ayeh. doc which is the list of Token also has its own extensions. The pips is information to use from quranic_nlp. The translation_translator is language for translate quran such that language (fa) or language along with # along with number books. For see all translate run below code

from quranic_nlp import utils
utils.print_all_translations()

Quranic NLP has its own spacy extensions. If related pipeline is not called, that extension cannot be used.

Format Inputs

There are three ways to format the input:

  1. number surah along with # along with number ayah.
  2. name surah along with # along with number ayah.
  3. search text in quran.

Note The last two calls require access to the internet for an API call.

from quranic_nlp import language

translation_translator = 'fa#1'
pips = 'dep,pos,root,lemma'
nlp = language.Pipeline(pips, translation_translator)

doc = nlp('1#1')
doc = nlp('حمد#1')
doc = nlp('رب العالمین')

Example

Two examples are provided below to demonstrate the usage of the library:

First, Displaying data from Surah Al-Fatiha, Verse 1:

first_doc = nlp('1#1')

Second, Displaying data from Surah Aal-i-Imran, Verse 200:

second_doc = nlp('3#200')

We have two functional sections that can be used with any input:

  1. Verse Information: This section provides detailed information about a specific verse in the Quran. The information related to a verse is structured as follows:

    • Verse Text and Meaning: The text of the verse is provided along with its meaning or translation.
    • Similar Verses: Similar verses are included, following the format Surah#Verse, along with the name of the verse. These verses share similarities in content or theme.
    • Verse Order: The order of the verse within the Surah is mentioned. Revelation Order: The chronological order of the revelation of the verse is specified.

First, Displaying data from Surah Al-Fatiha, Verse 1:

print(first_doc)
بِسْمِ اللَّهِ الرَّحْمَنِ الرَّحِیمِ 
print(first_doc._.text)
بِسْمِ اللَّهِ الرَّحْمَـٰنِ الرَّحِيمِ
print(first_doc._.surah)
فاتحه
print(first_doc._.ayah)
1
print(first_doc._.revelation_order)
63
print(first_doc._.translations)
ستايش خدا را كه پروردگار جهانيان است.
print(first_doc._.sim_ayahs)
['27#30', '1#3', '55#1', '41#2', '2#163', '59#22', '11#41', '12#92', '7#151', '24#20', '44#42', '6#118', '36#5', '26#191', '26#175', '26#159', '26#140', '26#122', '26#104', '26#68', '26#9', '26#217', '20#5', '19#88', '21#83', '24#10', '25#60', '15#49', '12#64', '19#93', '2#192', '4#96', '4#106', '30#5', '32#6', '9#99', '6#121', '78#37', '19#91', '22#34', '2#218', '19#85', '41#32', '22#28', '12#98', '19#96', '20#8', '19#18', '5#4', '2#64', '39#53', '22#40', '5#74', '3#157', '10#58', '52#28', '22#36', '19#78', '43#84', '50#33', '6#119', '5#98', '17#110', '19#87', '21#26', '9#27', '36#58', '49#10', '26#5', '16#18', '9#104', '7#180', '6#138', '3#129', '23#118', '7#49', '67#29', '20#109', '27#46', '19#92', '43#36', '67#28', '25#59', '19#69', '2#37', '21#112', '43#20', '24#14', '69#52', '56#96', '56#74', '11#73', '3#132', '24#5', '3#89', '42#5', '43#45', '36#15', '57#28', '48#14']

Second, Displaying data from Surah Aal-i-Imran, Verse 200:

print(second_doc)
يَا أَیُّهَا الَّذِينَ  آمَن اِصْبِر وَ صَابِر وَ رَابِط وَ اِتَّق اللَّهَ لَعَلَّکُمْ تُفْلِحُو اُو اُو اُو اُو اُو نَ 
print(second_doc._.text)
يَا أَيُّهَا الَّذِينَ آمَنُوا اصْبِرُوا وَصَابِرُوا وَرَابِطُوا وَاتَّقُوا اللَّهَ لَعَلَّكُمْ تُفْلِحُونَ
print(second_doc._.surah)
آل عمران
print(second_doc._.ayah)
200
print(second_doc._.revelation_order)
89
print(second_doc._.translations)
اى كسانى كه ايمان آوردهايد، شكيبا باشيد و ديگران را به شكيبايى فراخوانيد و در جنگها پايدارى كنيد و از خدا بترسيد، باشد كه رستگار شويد.
print(second_doc._.sim_ayahs)
['3#130', '5#35', '9#119', '33#70', '2#189', '8#45', '59#18', '22#77', '3#102', '3#123', '26#179', '26#163', '26#150', '26#144', '26#131', '26#126', '26#110', '26#108', '2#153', '49#1', '49#10', '2#278', '70#5', '5#90', '33#41', '61#2', '15#69', '57#28', '5#57', '7#69', '5#100', '2#183', '8#29', '5#11', '58#9', '52#16', '47#7', '8#20', '74#7', '5#8', '47#33', '8#27', '8#15', '7#87', '10#63', '61#10', '3#149', '3#100', '9#123', '33#69', '2#104', '2#172', '4#71', '16#127', '23#1', '5#87', '33#56', '62#10', '65#10', '49#12', '4#144', '63#9', '24#27', '2#208', '58#11', '64#14', '5#93', '5#105', '33#9', '27#53', '41#18', '5#88', '9#23', '61#14', '29#59', '16#42', '8#46', '60#13', '8#24', '4#59', '62#9', '49#6', '4#136', '5#51', '4#1', '5#2', '58#12', '4#29', '66#6', '5#94', '33#1', '2#254', '8#69', '87#14', '91#9', '33#49', '5#1', '2#21', '64#16', '9#34']
  1. Word Information: In this section, you will find information specifically related to the words within a particular verse. The information related to a verse is structured as follows:
    • Word Text: The actual text of the word.
    • Tag: The part-of-speech tag that describes the word's grammatical category.
    • Dependency: The dependency relationship of the word within the sentence.
    • Lemma: The base or dictionary form of the word.
    • Root: The root form of the word, which captures its core meaning.
    • Head: The head word to which the current word is dependent.
    • Arc Dep: The arc dependency label that represents the grammatical relationship between the head word and the current word.
    • Rel: The semantic or syntactic relationship between the head word and the current word. First, Displaying data from Surah Al-Fatiha, Verse 1: I will show third word in verse.
word=first_doc[2]
print(word)
اللَّهِ
print(word.dep_)
نعت
print(word.head)
رَّحِیمِ
print(word.lemma_)
ٱللَّه
print(word.pos_)
from quranic_nlp import utils
print(utils.POS_UNI_FA[word.pos_])
NOUN
اسم
print(word._.dep_arc)
LTR
print(word._.root)
اله

Second, Displaying data from Surah Aal-i-Imran, Verse 200: I will show sixth word in verse.

word=second_doc[5]
print(word)
اِصْبِر
print(word.dep_)
الف زینت
print(word.head)
ا
print(word.lemma_)
صَبَرَ
print(word.pos_)
from quranic_nlp import utils
print(utils.POS_UNI_FA[word.pos_])
VERB
فعل
print(word._.dep_arc)
LTR

LTR : Left to Right RTL : Right to Left

print(word._.root)
صبر

At the end, to jsonify the results you can use the following:

dictionary = language.to_json(pips, doc)
print(dictionary)
[{'id': 1, 'text': بِ, 'root': '', 'lemma': '', 'pos': 'INTJ', 'rel': 'مجرور', 'arc': 'LTR', 'head': سْمِ}, {'id': 2, 'text': سْمِ, 'root': 'سمو', 'lemma': 'ٱسْم', 'pos': 'NOUN', 'rel': 'مضاف الیه ', 'arc': 'LTR', 'head': اللَّهِ}, {'id': 3, 'text': اللَّهِ, 'root': 'اله', 'lemma': 'ٱللَّه', 'pos': 'NOUN', 'rel': 'نعت', 'arc': 'LTR', 'head': رَّحِیمِ}, {'id': 4, 'text': ال, 'root': '', 'lemma': '', 'pos': 'INTJ', 'rel': 'تعریف', 'arc': 'RTL', 'head': رَّحْمَنِ}, {'id': 5, 'text': رَّحْمَنِ, 'root': 'رحم', 'lemma': 'رَّحْمَٰن', 'pos': 'NOUN', 'rel': '', 'arc': None, 'head': رَّحْمَنِ}, {'id': 6, 'text': ال, 'root': '', 'lemma': '', 'pos': 'INTJ', 'rel': 'تعریف', 'arc': 'RTL', 'head': رَّحِیمِ}, {'id': 7, 'text': رَّحِیمِ, 'root': 'رحم', 'lemma': 'رَّحِيم', 'pos': 'NOUN', 'rel': '', 'arc': None, 'head': رَّحِیمِ}]

To show the results you can use the following:

from spacy import displacy
displacy.serve(doc, style="dep")
options = {"compact": True, "bg": "#09a3d5",
           "color": "white", "font": "xb-niloofar"}
displacy.serve(doc, style="dep", options=options)

Contributors

  • Seyyed Mohammad Aref Jahanmir
  • Alireza Sahebi
  • Doratossadat Dastgheyb
  • Erfan Mohammadi
  • Mahdi Ahmadi
  • Ehsaneddin Asgari

📧 Contact: asgari [dot] berkeley [dot] edu

Contributing

We warmly welcome contributions from the community! Whether you are a researcher, developer, linguist, or simply passionate about the Quran and NLP, there are many ways to get involved:

Area How to Help
New features New pipeline components, morphological analyses, or language support
Data quality Corrections to POS tags, dependency parses, lemmas, or roots
Translations Add or improve Quranic translations for underrepresented languages
Testing Help increase test coverage
Bug reports Open an issue if something doesn't work as expected
Documentation Clearer examples, tutorials, or API docs

To contribute, fork the repository, make your changes, and open a pull request. For larger changes, please open an issue first to discuss your idea.

We believe open collaboration leads to better tools for everyone. Every contribution, big or small, is valued and appreciated.

Bibles

import pickle

with open("./data_utils/bibles.pickle", "rb") as handle:
    bibles_dict = pickle.load(handle)
print(len(bibles_dict))
print(list(bibles_dict.keys())[0:10])
print(bibles_dict['01001009'])
38555
['01001001', '01001002', '01001003', '01001004', '01001005', '01001006', '01001007', '01001008', '01001009', '01001010']
{'arb-x-bible-1993': 'وقال الله لتجتمع المياه تحت السماء الى مكان واحد ولتظهر اليابسة . وكان كذلك .', 'eng-x-bible-amplified': 'Then God said : “ Let the waters under the heavens be collected together into one place , and let the dry land appear . ” And it was so .', 'grc-x-bible-accented': 'και ειπεν ο θεος συναχθητω το υδωρ το υποκατω του ουρανου εις συναγωγην μιαν και οφθητω η ξηρα και εγενετο ουτως και συνηχθη το υδωρ το υποκατω του ουρανου εις τας συναγωγας αυτων και ωφθη η ξηρα'}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

quranic_nlp-1.2.5.tar.gz (55.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

quranic_nlp-1.2.5-py3-none-any.whl (22.3 kB view details)

Uploaded Python 3

File details

Details for the file quranic_nlp-1.2.5.tar.gz.

File metadata

  • Download URL: quranic_nlp-1.2.5.tar.gz
  • Upload date:
  • Size: 55.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for quranic_nlp-1.2.5.tar.gz
Algorithm Hash digest
SHA256 29d2e2f90e8c7a1dd18f9396653c784a653b89a36e364f525954e845f94f449f
MD5 2d59b8c67df8f5b2a6b1f94f0041107a
BLAKE2b-256 912b62a84b0d22be1a6eaf6cc0b2aa2ac33edf662ddab0d1c30656d82188b805

See more details on using hashes here.

File details

Details for the file quranic_nlp-1.2.5-py3-none-any.whl.

File metadata

  • Download URL: quranic_nlp-1.2.5-py3-none-any.whl
  • Upload date:
  • Size: 22.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for quranic_nlp-1.2.5-py3-none-any.whl
Algorithm Hash digest
SHA256 f46239e3a775443c042b7dbcf4c73f140777885e467037a257ef9da73989fcac
MD5 4c7cf059f3dc1e1a11e877abeabba029
BLAKE2b-256 70e40afd20490ead7042719714b9501bed33ee93c4a207f90630e23b14634352

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page