Skip to main content

Habibi: Laying the Open-Source Foundation of Unified-Dialectal Arabic Speech Synthesis

Project description

Habibi-TTS

Official code for "Habibi: Laying the Open-Source Foundation of Unified-Dialectal Arabic Speech Synthesis"

arXiv demo Python lab lab
hfspace hfspace hfspace

Quick Start

# Install
pip install habibi-tts

# Launch the GUI TTS interface
habibi-tts_infer-gradio

[!IMPORTANT]
Read the F5-TTS documentation for (1) Detailed installation guidance; (2) Best practice for inference; etc.

CLI Usage

# Default using the Unified model (recommanded)
habibi-tts_infer-cli \
--ref_audio "assets/MSA.mp3" \
--ref_text "كان اللعيب حاضرًا في العديد من الأنشطة والفعاليات المرتبطة بكأس العالم، مما سمح للجماهير بالتفاعل معه والتقاط الصور التذكارية." \
--gen_text "أهلًا، يبدو أن هناك بعض التعقيدات، لكن لا تقلق، سأرشدك بطريقة سلسة وواضحة خطوة بخطوة."

# Assign the dialect ID, rather than inferred from given reference prompt (UNK, by default)
# (best use matched dialectal content with ID: MSA, SAU, UAE, ALG, IRQ, EGY, MAR, OMN, TUN, LEV, SDN, LBY)
habibi-tts_infer-cli --dialect MSA

# Alternatively, use `.toml` file to config, see `src/habibi_tts/infer/example.toml`
habibi-tts_infer-cli -c YOUR_CUSTOM.toml

# Check more CLI features with
habibi-tts_infer-cli --help

[!NOTE]
Some dialectal audio samples are provided under src/habibi_tts/assets, see the relevant README.md for usage and more details.

Training & Finetuning

See https://github.com/SWivid/Habibi-TTS/issues/2.

Benchmarking

0. Benchmark setup

# Example template for benchmark use:
python src/habibi_tts/eval/0_benchmark.py -d MSA
# --dialect DIALECT (MSA | SAU | UAE | ALG | IRQ | EGY | MAR)

1. Generate benchmark samples with Habibi or 11Labs

# Zero-shot TTS performance evaluation:
accelerate launch src/habibi_tts/eval/1_infer_habibi.py -m Unified -d MAR
# --model MODEL (Unified | Specialized)
# --dialect DIALECT (MSA | SAU | UAE | ALG | IRQ | EGY | MAR)

# Use single prompt, to compare with 11Labs model:
accelerate launch src/habibi_tts/eval/1_infer_habibi.py -m Specialized -d IRQ -s
# --single (<- add this flag)

# Use single prompt, call ElevenLabs Eleven v3 (alpha) API:
pip install elevenlabs
python src/habibi_tts/eval/1_infer_11labs.py -a YOUR_API_KEY -d MSA
# --api-key API_KEY (your 11labs account API key)
# --dialect DIALECT (MSA | SAU | UAE | ALG | IRQ | EGY | MAR)

2. Transcribe samples with ASR models and calculate WER

# Evaluate WER-O with Meta Omnilingual-ASR-LLM-7B v1:
pip install omnilingual-asr
python src/habibi_tts/eval/2_cal_wer-o.py -w results/Habibi/IRQ_Specialized_single -d IRQ
# --wav-dir WAV_DIR (the folder of generated samples)
# --dialect DIALECT (MSA | SAU | UAE | ALG | IRQ | EGY | MAR)
# --batch-size BATCH_SIZE (set smaller if OOM, default 64)

# Evaluate WER-S with dialect-specific ASR models:
python src/habibi_tts/eval/2_cal_wer-s.py -w results/Habibi/MAR_Unified -d MAR
# --wav-dir WAV_DIR (the folder of generated samples)
# --dialect DIALECT (EGY | MAR)

3. Calculate speaker similarity (SIM) between generated and prompt

Download WavLM Model from Google Drive, then

python src/habibi_tts/eval/3_cal_spksim.py -w results/Habibi/MAR_Unified -d MAR -c YOUR_WAVLM_PATH
# --wav-dir WAV_DIR (the folder of generated samples)
# --dialect DIALECT (MSA | SAU | UAE | ALG | IRQ | EGY | MAR)
# --ckpt CKPT (the path of download WavLM model)

python src/habibi_tts/eval/3_cal_spksim.py -w results/Habibi/IRQ_Specialized_single -d IRQ -c YOUR_WAVLM_PATH -s
# --single (if eval single prompt or 11labs results)

4. Calculate UTMOS of generated samples

python src/habibi_tts/eval/4_cal_utmos.py -w results/11Labs_3a/MSA
# --wav-dir WAV_DIR (the folder of generated samples)

[!NOTE]
If conflicts after omnilingual-asr installation, e.g. flash-attn, try re-install
pip uninstall -y flash-attn && pip install flash-attn --no-build-isolation

License

All code is released under MIT License.
The unified, SAU, and UAE models are licensed under CC-BY-NC-SA-4.0, restricted by SADA and Mixat.
The rest specialized models (ALG, EGY, IRQ, MAR, MSA) are released under Apache 2.0 license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

habibi_tts-0.1.1.tar.gz (2.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

habibi_tts-0.1.1-py3-none-any.whl (1.6 MB view details)

Uploaded Python 3

File details

Details for the file habibi_tts-0.1.1.tar.gz.

File metadata

  • Download URL: habibi_tts-0.1.1.tar.gz
  • Upload date:
  • Size: 2.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for habibi_tts-0.1.1.tar.gz
Algorithm Hash digest
SHA256 06062b9c47deda9f2aea349b287a9492a0bfb739e9ea1f3b3cb2a05780bac9d6
MD5 3b1c6774bd023a2ffa861d247a8e50bd
BLAKE2b-256 d854b96401042ef691102391a6bb8bacc7e8140608f9af8279e2041ad87efc68

See more details on using hashes here.

Provenance

The following attestation bundles were made for habibi_tts-0.1.1.tar.gz:

Publisher: publish-pypi.yaml on SWivid/Habibi-TTS

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file habibi_tts-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: habibi_tts-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 1.6 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for habibi_tts-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 df2d9ead85307c6c7cb2b14f3b730c35357f2a1578180098cf97d4c588228a5e
MD5 724a38230fec773d57352db2bb2bec66
BLAKE2b-256 e542e0fe1acc78fdb09ff0206570c4885546829dbb3d9e9831a7050619d7e10a

See more details on using hashes here.

Provenance

The following attestation bundles were made for habibi_tts-0.1.1-py3-none-any.whl:

Publisher: publish-pypi.yaml on SWivid/Habibi-TTS

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page