Skip to main content

Quran Phonetic Script with addional quarnic utils

Project description

Quran Muaalem

بعون الله وتوفيقه لا شريك له نقدم المعلم القرآني الذكي القادر على كشف أخطاء التلاوة والتجويد وصفات الحروف

PyPI Python Versions Hugging Face Model Hugging Face Dataset Google Colab arXiv MIT License Discord

📖 رابط لتجربة المعلم القرآني

يرجى الضغط على للتجربة:

الرابط

⚠️ تنبيه: هذا الرابط سينتهي في 27 أغسطس 2025

ALT_TEXT

الممزيات

  • مدرب على الرسم الصوتي للقرآن الكريم: quran-transcript القادر على كشف أخطاء الحروف والتجويد وصفات الحروف
  • نموذج معقول الحجم 660 MP
  • يحتاج فقط إله 1.5 GB من ذاكرة معالج الرسوميات
  • معمارية مبتكرة: CTC متعدد المستويات

المعمارية

معمارية مبتكرة: CTC متعدد المستويات. حيث كل مستوي يتدرب على وجه معين

multi-lvel-ctc

الخطوات المختصرة للتطوير

  • تجميع التلاوت القرآنية من القراء المتقنين: prepare-quran-dataset
  • تقسيم التلاوت على حسب الوقف وليس الآية باستخدام المقسم
  • الحصو على النص القرآني من المقاطع الصوتية باسخدام نموذج ترتيل
  • تصحيح النصوص المستخرجة من ترتيل باستخدام خوارزمية التسميع
  • تحويل الرسم الإملائي للرسم العثماني: quran-transcript
  • تحويل الرسم العثماني للرسم الصوتي للقرآني الكريم الذي يصف كل قواعد التجويد ما عدا الإشمام: quran-transcript
  • تدريب النموذج على معمارية Wav2Vec2BERT

استخدام النوذج

استخدام النموذج عن طريق واجهة gradio

قم بتزيل uv

pip install uv

أو

curl -LsSf https://astral.sh/uv/install.sh | sh

بعد ذلك قم بتنزيل ffmpeg

sudo apt-get update
sudo apt-get install -y ffmpeg

أو من خلال anaconda

conda install ffmpeg

قم بتشغيل gradio ب command واحد فقط:

uvx --no-cache --from https://github.com/obadx/quran-muaalem.git[ui]  quran-muaalem-ui

او

uvx quran-muaalem[ui]  quran-muaalem-ui

عن طريق python API

Installation

First, install the required dependencies:

# Install system dependencies
sudo apt-get install -y ffmpeg libsndfile1 portaudio19-dev

# Install Python packages
pip install quran-muaalem librosa "numba>=0.61.2"

Basic Usage Example

"""
Basic example of using the Quran Muaalem package for phonetic analysis of Quranic recitation.
"""

from dataclasses import asdict
import json
import logging

from quran_transcript import Aya, quran_phonetizer, MoshafAttributes
import torch
from librosa.core import load

# Import the main Muaalem class (adjust import based on your actual package structure)
from quran_muaalem import Muaalem

# Setup logging to see informative messages
logging.basicConfig(level=logging.INFO)

def analyze_recitation(audio_path):
    """
    Analyze a Quranic recitation audio file using the Muaalem model.
    
    Args:
        audio_path (str): Path to the audio file to analyze
    """
    # Configuration
    sampling_rate = 16000  # Must be 16000 Hz
    device = "cuda" if torch.cuda.is_available() else "cpu"  # Use GPU if available
    
    # Step 1: Prepare the Quranic reference text
    # Get the Uthmani script for a specific verse (Aya 8, Surah 75 in this example)
    uthmani_ref = Aya(8, 75).get_by_imlaey_words(17, 9).uthmani
    
    # Step 2: Configure the recitation style (Moshaf attributes)
    moshaf = MoshafAttributes(
        rewaya="hafs",        # Recitation style (Hafs is most common)
        madd_monfasel_len=2,  # Length of separated elongation
        madd_mottasel_len=4,  # Length of connected elongation
        madd_mottasel_waqf=4, # Length of connected elongation when stopping
        madd_aared_len=2,     # Length of necessary elongation
    )
    # see: https://github.com/obadx/prepare-quran-dataset?tab=readme-ov-file#moshaf-attributes-docs
    
    # Step 3: Convert text to phonetic representation
    # see docs for phnetizer: https://github.com/obadx/quran-transcript
    phonetizer_out = quran_phonetizer(uthmani_ref, moshaf, remove_spaces=True)
    
    # Step 4: Initialize the Muaalem model
    muaalem = Muaalem(device=device)
    
    # Step 5: Load and prepare the audio
    wave, _ = load(audio_path, sr=sampling_rate, mono=True)
    
    # Step 6: Process the audio with the model
    # The model analyzes the phonetic properties of the recitation
    outs = muaalem(
        [wave],           # Audio data
        [phonetizer_out],          # Phonetic reference
        sampling_rate=sampling_rate
    )
    
    # Step 7: Display the results
    for out in outs:
        print("Predicted Phonemes:", out.phonemes.text)
        
        # Display detailed phonetic features for each phoneme
        for sifa in out.sifat:
            print(json.dumps(asdict(sifa), indent=2, ensure_ascii=False))
            print("*" * 30)
        print("-" * 40)

    # Explaining Results
    explain_for_terminal(
        outs[0].phonemes.text,
        phonetizer_out.phonemes,
        outs[0].sifat,
        phonetizer_out.sifat,
    )


if __name__ == "__main__":
    # Replace with the path to your audio file
    audio_path = "./assets/test.wav"
    
    try:
        analyze_recitation(audio_path)
    except Exception as e:
        logging.error(f"Error processing audio: {e}")

Output:

ءِننننَللَااهَبِكُللِشَيءِنعَلِۦۦمُ۾۾۾بَرَااااءَتُممممِنَللَااهِوَرَسُۥۥلِه
┏━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ Phonemes  Tafashie        Qalqla        Ghonna        Hams Or Jahr  Safeer     Tikraar      Tafkheem Or Taqeeq  Istitala       Shidda Or Rakhawa  Itbaq    ┃
┡━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ ءِ         not_motafashie  not_moqalqal  not_maghnoon  jahr          no_safeer  not_mokarar  moraqaq             not_mostateel  shadeed            monfateh │
│ ننننَ      not_motafashie  not_moqalqal  maghnoon      jahr          no_safeer  not_mokarar  moraqaq             not_mostateel  between            monfateh │
│ للَ        not_motafashie  not_moqalqal  not_maghnoon  jahr          no_safeer  not_mokarar  mofakham            not_mostateel  between            monfateh │
│ اا        not_motafashie  not_moqalqal  not_maghnoon  jahr          no_safeer  not_mokarar  mofakham            not_mostateel  rikhw              monfateh │
│ هَ         not_motafashie  not_moqalqal  not_maghnoon  hams          no_safeer  not_mokarar  moraqaq             not_mostateel  rikhw              monfateh │
│ بِ         not_motafashie  not_moqalqal  not_maghnoon  jahr          no_safeer  not_mokarar  moraqaq             not_mostateel  shadeed            monfateh │
│ كُ         not_motafashie  not_moqalqal  not_maghnoon  hams          no_safeer  not_mokarar  moraqaq             not_mostateel  shadeed            monfateh │
│ للِ        not_motafashie  not_moqalqal  not_maghnoon  jahr          no_safeer  not_mokarar  moraqaq             not_mostateel  between            monfateh │
│ شَ         motafashie      not_moqalqal  not_maghnoon  hams          no_safeer  not_mokarar  moraqaq             not_mostateel  rikhw              monfateh │
│ ي         not_motafashie  not_moqalqal  not_maghnoon  jahr          no_safeer  not_mokarar  moraqaq             not_mostateel  rikhw              monfateh │
│ ءِ         not_motafashie  not_moqalqal  not_maghnoon  jahr          no_safeer  not_mokarar  moraqaq             not_mostateel  shadeed            monfateh │
│ ن         not_motafashie  not_moqalqal  maghnoon      jahr          no_safeer  not_mokarar  moraqaq             not_mostateel  between            monfateh │
│ عَ         not_motafashie  not_moqalqal  not_maghnoon  jahr          no_safeer  not_mokarar  moraqaq             not_mostateel  between            monfateh │
│ لِ         not_motafashie  not_moqalqal  not_maghnoon  jahr          no_safeer  not_mokarar  moraqaq             not_mostateel  between            monfateh │
│ ۦۦ        not_motafashie  not_moqalqal  not_maghnoon  jahr          no_safeer  not_mokarar  moraqaq             not_mostateel  rikhw              monfateh │
│ مُ         not_motafashie  not_moqalqal  maghnoon      jahr          no_safeer  not_mokarar  moraqaq             not_mostateel  between            monfateh │
│ ۾۾۾       not_motafashie  not_moqalqal  maghnoon      jahr          no_safeer  not_mokarar  moraqaq             not_mostateel  rikhw              monfateh │
│ بَ         not_motafashie  not_moqalqal  not_maghnoon  jahr          no_safeer  not_mokarar  moraqaq             not_mostateel  shadeed            monfateh │
│ رَ         not_motafashie  not_moqalqal  not_maghnoon  jahr          no_safeer  mokarar      mofakham            not_mostateel  between            monfateh │
│ اااا      not_motafashie  not_moqalqal  not_maghnoon  jahr          no_safeer  not_mokarar  mofakham            not_mostateel  rikhw              monfateh │
│ ءَ         not_motafashie  not_moqalqal  not_maghnoon  jahr          no_safeer  not_mokarar  moraqaq             not_mostateel  shadeed            monfateh │
│ تُ         not_motafashie  not_moqalqal  not_maghnoon  hams          no_safeer  not_mokarar  moraqaq             not_mostateel  shadeed            monfateh │
│ ممممِ      not_motafashie  not_moqalqal  maghnoon      jahr          no_safeer  not_mokarar  moraqaq             not_mostateel  between            monfateh │
│ نَ         not_motafashie  not_moqalqal  maghnoon      jahr          no_safeer  not_mokarar  moraqaq             not_mostateel  between            monfateh │
│ للَ        not_motafashie  not_moqalqal  not_maghnoon  jahr          no_safeer  not_mokarar  mofakham            not_mostateel  between            monfateh │
│ اا        not_motafashie  not_moqalqal  not_maghnoon  jahr          no_safeer  not_mokarar  mofakham            not_mostateel  rikhw              monfateh │
│ هِ         not_motafashie  not_moqalqal  not_maghnoon  hams          no_safeer  not_mokarar  moraqaq             not_mostateel  rikhw              monfateh │
│ وَ         not_motafashie  not_moqalqal  not_maghnoon  jahr          no_safeer  not_mokarar  moraqaq             not_mostateel  rikhw              monfateh │
│ رَ         not_motafashie  not_moqalqal  not_maghnoon  jahr          no_safeer  mokarar      mofakham            not_mostateel  between            monfateh │
│ سُ         not_motafashie  not_moqalqal  not_maghnoon  hams          safeer     not_mokarar  moraqaq             not_mostateel  rikhw              monfateh │
│ ۥۥ        not_motafashie  not_moqalqal  not_maghnoon  jahr          no_safeer  not_mokarar  moraqaq             not_mostateel  rikhw              monfateh │
│ لِ         not_motafashie  not_moqalqal  not_maghnoon  jahr          no_safeer  not_mokarar  moraqaq             not_mostateel  between            monfateh │
│ ه         not_motafashie  not_moqalqal  not_maghnoon  hams          no_safeer  not_mokarar  moraqaq             not_mostateel  rikhw              monfateh │
└──────────┴────────────────┴──────────────┴──────────────┴──────────────┴───────────┴─────────────┴────────────────────┴───────────────┴───────────────────┴──────────┘

API Docs

class Muaalem:
    def __init__(
        self,
        model_name_or_path: str = "obadx/muaalem-model-v3_2",
        device: str = "cpu",
        dtype=torch.bfloat16,
    ):
        """
        Initializing Muallem Model

        Args:
            model_name_or_path: the huggingface model name or path
            device: the device to run model on
            dtype: the torch dtype. Default is `torch.bfloat16` as the model was trained on
        """

    @torch.no_grad()
    def __call__(
        self,
        waves: list[list[float] | torch.FloatTensor | NDArray],
        ref_quran_phonetic_script_list: list[QuranPhoneticScriptOutput],
        sampling_rate: int,
    ) -> list[MuaalemOutput]:
        """Infrence Funcion for the Quran Muaalem Project

                waves: input waves  batch , seq_len with different formats described above
                ref_quran_phonetic_script_list (list[QuranPhoneticScriptOutput]): list of the
                    phonetized ouput of `quran_transcript.quran_phonetizer` with `remove_space=True`

                sampleing_rate (int): has to be 16000

        Returns:
            list[MuaalemOutput]:
                A list of output objects, each containing phoneme predictions and their
                phonetic features (sifat) for a processed input.

            Each MuaalemOutput contains:
                phonemes (Unit):
                    A dataclass representing the predicted phoneme sequence with:
                        text (str): Concatenated string of all phonemes.
                        probs (Union[torch.FloatTensor, list[float]]):
                            Confidence probabilities for each predicted phoneme.
                        ids (Union[torch.LongTensor, list[int]]):
                            Token IDs corresponding to each phoneme.

                sifat (list[Sifa]):
                    A list of phonetic feature dataclasses (one per phoneme) with the
                    following optional properties (each is a SingleUnit or None):
                        - phonemes_group (str): the phonemes associated with the `sifa`
                        - hams_or_jahr (SingleUnit): either `hams` or `jahr`
                        - shidda_or_rakhawa (SingleUnit): either `shadeed`, `between`, or `rikhw`
                        - tafkheem_or_taqeeq (SingleUnit): either `mofakham`, `moraqaq`, or `low_mofakham`
                        - itbaq (SingleUnit): either `monfateh`, or `motbaq`
                        - safeer (SingleUnit): either `safeer`, or `no_safeer`
                        - qalqla (SingleUnit): eithr `moqalqal`, or `not_moqalqal`
                        - tikraar (SingleUnit): either `mokarar` or `not_mokarar`
                        - tafashie (SingleUnit): either `motafashie`, or `not_motafashie`
                        - istitala (SingleUnit): either `mostateel`, or `not_mostateel`
                        - ghonna (SingleUnit): either `maghnoon`, or `not_maghnoon`

            Each SingleUnit in Sifa properties contains:
                text (str): The feature's categorical label (e.g., "hams", "shidda").
                prob (float): Confidence probability for this feature.
                idx (int): Identifier for the feature class.
        """

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

quran_muaalem-0.0.3.tar.gz (38.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

quran_muaalem-0.0.3-py3-none-any.whl (32.6 kB view details)

Uploaded Python 3

File details

Details for the file quran_muaalem-0.0.3.tar.gz.

File metadata

  • Download URL: quran_muaalem-0.0.3.tar.gz
  • Upload date:
  • Size: 38.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.2

File hashes

Hashes for quran_muaalem-0.0.3.tar.gz
Algorithm Hash digest
SHA256 54d58b83a64c5e0df613972d08e809399aa1b5e6fd88620b4c855411908d25f5
MD5 da3c452cd5e297b3ed888171c926ddc4
BLAKE2b-256 13041ccbc2008d2f62f4906bd34ed8e41fdf1474c727668c6eea30563c54a849

See more details on using hashes here.

File details

Details for the file quran_muaalem-0.0.3-py3-none-any.whl.

File metadata

File hashes

Hashes for quran_muaalem-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 0ac2ef286d7be432c60966fd195b4315a11ef08fcd0f0bf1c7ca9c2fd96d354b
MD5 d84e1c519002558c02dc45192a6a71c2
BLAKE2b-256 18d2dd2d492c5dc3f1a335e6dc11e177d206d60faf67e18e32624ef0c9a26b35

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page