Habibi: Laying the Open-Source Foundation of Unified-Dialectal Arabic Speech Synthesis
Project description
Habibi-TTS
Official code for "Habibi: Laying the Open-Source Foundation of Unified-Dialectal Arabic Speech Synthesis"
Quick Start
# Install
pip install habibi-tts
# Launch the GUI TTS interface
habibi-tts_infer-gradio
[!IMPORTANT]
Read the F5-TTS documentation for (1) Detailed installation guidance; (2) Best practice for inference; etc.
CLI Usage
# Default using the Unified model (recommanded)
habibi-tts_infer-cli \
--ref_audio "assets/MSA.mp3" \
--ref_text "كان اللعيب حاضرًا في العديد من الأنشطة والفعاليات المرتبطة بكأس العالم، مما سمح للجماهير بالتفاعل معه والتقاط الصور التذكارية." \
--gen_text "أهلًا، يبدو أن هناك بعض التعقيدات، لكن لا تقلق، سأرشدك بطريقة سلسة وواضحة خطوة بخطوة."
# Assign the dialect ID, rather than inferred from given reference prompt (UNK, by default)
# (best use matched dialectal content with ID: MSA, SAU, UAE, ALG, IRQ, EGY, MAR, OMN, TUN, LEV, SDN, LBY)
habibi-tts_infer-cli --dialect MSA
# Alternatively, use `.toml` file to config, see `src/habibi_tts/infer/example.toml`
habibi-tts_infer-cli -c YOUR_CUSTOM.toml
# Check more CLI features with
habibi-tts_infer-cli --help
[!NOTE]
Some dialectal audio samples are provided undersrc/habibi_tts/assets, see the relevant README.md for usage and more details.
Training & Finetuning
See https://github.com/SWivid/Habibi-TTS/issues/2.
Benchmarking
0. Benchmark setup
# Example template for benchmark use:
python src/habibi_tts/eval/0_benchmark.py -d MSA
# --dialect DIALECT (MSA | SAU | UAE | ALG | IRQ | EGY | MAR)
1. Generate benchmark samples with Habibi or 11Labs
# Zero-shot TTS performance evaluation:
accelerate launch src/habibi_tts/eval/1_infer_habibi.py -m Unified -d MAR
# --model MODEL (Unified | Specialized)
# --dialect DIALECT (MSA | SAU | UAE | ALG | IRQ | EGY | MAR)
# Use single prompt, to compare with 11Labs model:
accelerate launch src/habibi_tts/eval/1_infer_habibi.py -m Specialized -d IRQ -s
# --single (<- add this flag)
# Use single prompt, call ElevenLabs Eleven v3 (alpha) API:
pip install elevenlabs
python src/habibi_tts/eval/1_infer_11labs.py -a YOUR_API_KEY -d MSA
# --api-key API_KEY (your 11labs account API key)
# --dialect DIALECT (MSA | SAU | UAE | ALG | IRQ | EGY | MAR)
2. Transcribe samples with ASR models and calculate WER
# Evaluate WER-O with Meta Omnilingual-ASR-LLM-7B v1:
pip install omnilingual-asr
python src/habibi_tts/eval/2_cal_wer-o.py -w results/Habibi/IRQ_Specialized_single -d IRQ
# --wav-dir WAV_DIR (the folder of generated samples)
# --dialect DIALECT (MSA | SAU | UAE | ALG | IRQ | EGY | MAR)
# --batch-size BATCH_SIZE (set smaller if OOM, default 64)
# Evaluate WER-S with dialect-specific ASR models:
python src/habibi_tts/eval/2_cal_wer-s.py -w results/Habibi/MAR_Unified -d MAR
# --wav-dir WAV_DIR (the folder of generated samples)
# --dialect DIALECT (EGY | MAR)
3. Calculate speaker similarity (SIM) between generated and prompt
Download WavLM Model from Google Drive, then
python src/habibi_tts/eval/3_cal_spksim.py -w results/Habibi/MAR_Unified -d MAR -c YOUR_WAVLM_PATH
# --wav-dir WAV_DIR (the folder of generated samples)
# --dialect DIALECT (MSA | SAU | UAE | ALG | IRQ | EGY | MAR)
# --ckpt CKPT (the path of download WavLM model)
python src/habibi_tts/eval/3_cal_spksim.py -w results/Habibi/IRQ_Specialized_single -d IRQ -c YOUR_WAVLM_PATH -s
# --single (if eval single prompt or 11labs results)
4. Calculate UTMOS of generated samples
python src/habibi_tts/eval/4_cal_utmos.py -w results/11Labs_3a/MSA
# --wav-dir WAV_DIR (the folder of generated samples)
[!NOTE]
If conflicts after omnilingual-asr installation, e.g. flash-attn, try re-install
pip uninstall -y flash-attn && pip install flash-attn --no-build-isolation
License
All code is released under MIT License.
The unified, SAU, and UAE models are licensed under CC-BY-NC-SA-4.0, restricted by SADA and Mixat.
The rest specialized models (ALG, EGY, IRQ, MAR, MSA) are released under Apache 2.0 license.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file habibi_tts-0.1.1.tar.gz.
File metadata
- Download URL: habibi_tts-0.1.1.tar.gz
- Upload date:
- Size: 2.5 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
06062b9c47deda9f2aea349b287a9492a0bfb739e9ea1f3b3cb2a05780bac9d6
|
|
| MD5 |
3b1c6774bd023a2ffa861d247a8e50bd
|
|
| BLAKE2b-256 |
d854b96401042ef691102391a6bb8bacc7e8140608f9af8279e2041ad87efc68
|
Provenance
The following attestation bundles were made for habibi_tts-0.1.1.tar.gz:
Publisher:
publish-pypi.yaml on SWivid/Habibi-TTS
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
habibi_tts-0.1.1.tar.gz -
Subject digest:
06062b9c47deda9f2aea349b287a9492a0bfb739e9ea1f3b3cb2a05780bac9d6 - Sigstore transparency entry: 1056703458
- Sigstore integration time:
-
Permalink:
SWivid/Habibi-TTS@30d8d8a09fbcf61b4e6d7e081b5f1a73c592cf51 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/SWivid
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yaml@30d8d8a09fbcf61b4e6d7e081b5f1a73c592cf51 -
Trigger Event:
release
-
Statement type:
File details
Details for the file habibi_tts-0.1.1-py3-none-any.whl.
File metadata
- Download URL: habibi_tts-0.1.1-py3-none-any.whl
- Upload date:
- Size: 1.6 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
df2d9ead85307c6c7cb2b14f3b730c35357f2a1578180098cf97d4c588228a5e
|
|
| MD5 |
724a38230fec773d57352db2bb2bec66
|
|
| BLAKE2b-256 |
e542e0fe1acc78fdb09ff0206570c4885546829dbb3d9e9831a7050619d7e10a
|
Provenance
The following attestation bundles were made for habibi_tts-0.1.1-py3-none-any.whl:
Publisher:
publish-pypi.yaml on SWivid/Habibi-TTS
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
habibi_tts-0.1.1-py3-none-any.whl -
Subject digest:
df2d9ead85307c6c7cb2b14f3b730c35357f2a1578180098cf97d4c588228a5e - Sigstore transparency entry: 1056703474
- Sigstore integration time:
-
Permalink:
SWivid/Habibi-TTS@30d8d8a09fbcf61b4e6d7e081b5f1a73c592cf51 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/SWivid
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yaml@30d8d8a09fbcf61b4e6d7e081b5f1a73c592cf51 -
Trigger Event:
release
-
Statement type: