quarnic nlp
Project description
QuranicTools: A Python NLP Library for Quranic NLP
Part of Speech Tagging
|
Dependency Parsing
|
Lemmatizer
|
Multilingual Search
|
Quranic Extractions
|
Revelation Order
|
Embeddings (coming soon)
|
Translations
Quranic NLP
Quranic NLP is a computational toolbox to conduct various syntactic and semantic analyses of Quranic verses. The aim is to put together all available resources contributing to a better understanding/analysis of the Quran for everyone.
Contents:
- Installation
- Pipeline
- Input Formats
- Verse Information
- Translations
- Similar Verses
- Multiple Matches
- Word-level Analysis
- JSON Output
- Hadiths
- Visualization
- Contributors
- Contributing
Installation
Step 1 — Install the package
pip install quranic-nlp
Step 2 — Download the data
The library requires data files (~97MB) that are downloaded separately from GitHub Releases:
quranic_data
Or from Python:
from quranic_nlp.data_requirements import download_data
download_data()
Data is downloaded once and stored inside the package directory automatically.
Development Setup
To set up a local development environment:
git clone https://github.com/language-ml/hadith-quranic_nlp.git
cd hadith-quranic_nlp
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
pip install -e .
quranic_data
Pipeline
Available pipeline components:
| Key | Description |
|---|---|
dep |
Dependency parsing |
pos |
Part-of-speech tagging |
root |
Root extraction |
lem |
Lemmatization |
from quranic_nlp import language, utils, constant
pips = 'dep,pos,root,lem'
nlp = language.Pipeline(pips, translation_lang='fa#1')
To see all available translation languages and translators:
utils.print_all_translations()
Input Formats
Three ways to reference a verse:
# 1. surah_number#ayah_number (no internet required)
doc = nlp('1#1')
# 2. surah_name#ayah_number (requires internet)
doc = nlp('حمد#1')
# 3. Free Arabic text — returns a list of all matching docs (requires internet)
docs = nlp('رب العالمین')
Verse Information
doc = nlp('1#1')
print(doc) # بِسْمِ اللَّهِ الرَّحْمَنِ الرَّحِیمِ
print(doc._.text) # بِسْمِ اللَّهِ الرَّحْمَـٰنِ الرَّحِيمِ (full diacritics)
print(doc._.surah) # فاتحه
print(doc._.ayah) # 1
print(doc._.revelation_order) # 5
Translations
Pass '<lang>#<index>' for a single translator (returns a string):
nlp_en = language.Pipeline(pips, 'en#16') # Yusuf Ali
doc = nlp_en('1#1')
print(doc._.translations)
# In the name of Allah, the Beneficent, the Merciful.
Pass '<lang>' (no index) for all translators (returns a dict keyed by translator name):
nlp_fa = language.Pipeline(pips, 'fa')
doc = nlp_fa('1#2')
print(doc._.translations)
# {
# 'ansarian': 'همه ستایش ها، ویژه خدا، مالک و مربّی جهانیان است.',
# 'ayati': 'ستايش خدا را كه پروردگار جهانيان است.',
# 'bahrampour': 'ستايش خداى را كه پروردگار جهانيان است',
# ... # 12 Persian translators total
# }
Similar Verses
doc._.sim_ayahs returns a list of (ref, score) tuples sorted by similarity score:
doc = nlp('1#2')
for ref, score in doc._.sim_ayahs[:5]:
print(f'{ref:10s} score={score:.4f}')
37#182 score=1.0000
6#45 score=0.5199
40#65 score=0.4620
10#10 score=0.3862
39#75 score=0.3793
Multiple Matches
When free Arabic text matches multiple verses, nlp(text) returns a list of docs:
docs = nlp('رب العالمین')
print(f'Found {len(docs)} matching verses')
for doc in docs[:3]:
print(doc._.surah, doc._.ayah, '—', doc._.text)
فاتحه 2 — الْحَمْدُ لِلَّهِ رَبِّ الْعَالَمِينَ
مائده 28 — لَئِن بَسَطتَ إِلَيَّ يَدَكَ...
انعام 45 — فَقُطِعَ دَابِرُ الْقَوْمِ...
You can also call search_all explicitly with a max_results cap:
docs = language.search_all(nlp, 'رب العالمین', max_results=5)
Word-level Analysis
doc = nlp('1#1')
word = doc[2] # third word: اللَّهِ
print(word) # اللَّهِ
print(word.pos_) # NOUN
print(constant.POS_UNI_FA[word.pos_]) # اسم
print(word.lemma_) # ٱللَّه
print(word._.root) # اله
print(word.dep_) # نعت
print(word._.dep_arc) # LTR (Left-to-Right arc)
print(word.head) # رَّحِیمِ
Print a table of all words:
print(f"{'Word':<20} {'POS':<8} {'Lemma':<15} {'Root':<10} {'Dep'}")
print('-' * 65)
for token in doc:
print(f'{str(token):<20} {token.pos_:<8} {token.lemma_:<15} {str(token._.root):<10} {token.dep_}')
JSON Output
import json
result = language.to_json(pips, doc)
print(json.dumps(result, ensure_ascii=False, indent=2))
[
{"id": 1, "text": "بِ", "root": "", "lemma": "", "pos": "INTJ", "rel": "مجرور", "arc": "LTR", "head": "سْمِ"},
{"id": 2, "text": "سْمِ", "root": "سمو", "lemma": "ٱسْم", "pos": "NOUN", "rel": "مضاف الیه", "arc": "LTR", "head": "اللَّهِ"},
{"id": 3, "text": "اللَّهِ","root": "اله", "lemma": "ٱللَّه","pos": "NOUN", "rel": "نعت", "arc": "LTR", "head": "رَّحِیمِ"},
...
]
Hadiths
hadiths = doc._.hadiths
if hadiths:
print(f'Found {len(hadiths)} hadith(s)')
print(hadiths[0])
else:
print('No hadiths found or API unavailable.')
Visualization
Render the dependency parse tree using spaCy's displacy:
from spacy import displacy
options = {'compact': True, 'bg': '#09a3d5', 'color': 'white', 'font': 'Arial'}
displacy.render(doc, style='dep', options=options, jupyter=True)
Contributors
- Seyyed Mohammad Aref Jahanmir
- Alireza Sahebi
- Doratossadat Dastgheyb
- Erfan Mohammadi
- Mahdi Ahmadi
- Ehsaneddin Asgari
📧 Contact: asgari [dot] berkeley [dot] edu
Contributing
We warmly welcome contributions from the community! Whether you are a researcher, developer, linguist, or simply passionate about the Quran and NLP, there are many ways to get involved:
| Area | How to Help |
|---|---|
| New features | New pipeline components, morphological analyses, or language support |
| Data quality | Corrections to POS tags, dependency parses, lemmas, or roots |
| Translations | Add or improve Quranic translations for underrepresented languages |
| Testing | Help increase test coverage |
| Bug reports | Open an issue if something doesn't work as expected |
| Documentation | Clearer examples, tutorials, or API docs |
To contribute, fork the repository, make your changes, and open a pull request. For larger changes, please open an issue first to discuss your idea.
We believe open collaboration leads to better tools for everyone. Every contribution, big or small, is valued and appreciated.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file quranic_nlp-1.3.3.tar.gz.
File metadata
- Download URL: quranic_nlp-1.3.3.tar.gz
- Upload date:
- Size: 54.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
97d813e5972cfdcada5fc43fb6f4f6064120a0ee1300c31be3283b89b19de5d0
|
|
| MD5 |
ccdd6b1aa1dac782d5267a746c76dd26
|
|
| BLAKE2b-256 |
bd95ca80f45d2bb1830829bceec532e1b2c3462e7b85d3400cfc97a1f0593d5d
|
File details
Details for the file quranic_nlp-1.3.3-py3-none-any.whl.
File metadata
- Download URL: quranic_nlp-1.3.3-py3-none-any.whl
- Upload date:
- Size: 21.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
98b4fc2e0d719f9499e9d55e2e10429892477eda33f40d45bdc387576d0c3a28
|
|
| MD5 |
b44d905903b5c0eee5f5211350068b94
|
|
| BLAKE2b-256 |
791c5fb00af99136760f4e34d65e2788b5a7aa9be82cbe2cb927f003310afa73
|