Skip to main content

SILMA TTS: A Lightweight Open Bilingual Text to Speech Model

Project description

SILMA TTS: A Lightweight Open Bilingual Text to Speech Model

HF hfspace

SILMA TTS v1 is a high-performance, 150M-parameter bilingual (Arabic/English) TTS model developed by SILMA AI. Built on the cutting-edge F5-TTS diffusion architecture, the model was pretrained from scratch using tens of thousands of hours of high-quality public and proprietary data. To give back to the community, SILMA TTS is released under a highly permissive license, making state-of-the-art speech synthesis accessible for both research and commercial use.

Features

  1. High-Fidelity Audio: Superior speech synthesis with high-quality output
  2. Lightweight 150M Parameter Model: works well in low-resource environments
  3. Instant Voice Cloning
  4. Ultra-Low Latency: Optimized for real-time applications with RTF around 0.12 (RTX 4090 GPU)
  5. Bilingual Arabic & English Support: Native-level fluency across both languages
  6. Advanced Arabic Diacritization: Full support for Tashkeel to ensure precise pronunciation and context
  7. Text Normalization: utilizing NeMo Text Processing
  8. Commercial-Friendly Licensing: Fully open-source under the Apache 2.0 License

Installation

Using pip

# make sure ffmpeg is installed
apt-get update && apt install ffmpeg -y

# create and activate the environment
python -m venv silma-tts-env
source silma-tts-env/bin/activate

# install silma-tts library
pip install silma-tts

From source

# make sure ffmpeg is installed
apt-get update && apt install ffmpeg -y

# create and activate the environment
python -m venv silma-tts-env
source silma-tts-env/bin/activate

# clone repo and install
git clone https://github.com/SILMA-AI/silma-tts.git
cd silma-tts
pip install -e .

Usage & Inference

Using gradio app

# Run the following command
silma-tts-app

Then open the following browser link http://127.0.0.1:7860/

Inference using python

import time
from silma_tts.api import SilmaTTS

silma_tts = SilmaTTS()

## the voice/style you want to clone
reference_audio_file = "/root/silma-tts/src/silma_tts/infer/ref_audio_samples/ar.ref.24k.wav"
## the transcription of the reference_audio_file
reference_audio_text = "ويدقق النظر في القرآن الكريم وسائر الكتب السماوية ويتبع مسالك الرسل العظام عليهم الصلاة والسلام."

time_start = time.time()

wav, sr, spec = silma_tts.infer(
    ref_file=reference_audio_file,
    ref_text=reference_audio_text, # can also be left None - will be transcribed on the fly
    gen_text="""
    أنا نموذج جديد من سلمى لتحويل النص إلى كلام، يمكنني التحدث باللغة العربية مع أو بدون علامات التشكيل.
    I am the new SILMA model for converting text to speech, I can speak Arabic with or without diacritics.
    """.strip(),
    file_wave=str("generated_audio.wav"),
    seed=None,
    speed=1
)

time_end = time.time()
print(f"Time elapsed:{(time_end-time_start):.2f} seconds")

## Note 1: generated audio file (generated_audio.wav) will be saved in the current directory
## Note 2: You can also use the "wav" variable (raw waveform) to play the audio to return it via API

You can also run the example above directly using the following command, but only if you installed from source:

python src/silma_tts/infer/example.py

Training

Our model is 100% compatible with F5-TTS v1.1.7. This means you can make use of all the great resources, tools and community experince in the F5-TTS project.

Steps

## clone F5-TTS v1.1.7
cd /root
git clone --depth 1  --branch 1.1.7 https://github.com/SWivid/F5-TTS.git
cd F5-TTS
pip install -e .

## download silma-tts model weights, vocab.txt and config.yaml
hf download silma-ai/silma-tts  --local-dir /root/silma-tts-v1-weights

## create the project, then replace the default configuration, vocabulary, and fine-tuning Python file. Note that the patched finetune_cli.py overrides the F5TTS_v1_Base config with the silma-tts model config.
mkdir /root/F5-TTS/data/finetuning_project_char
cp /root/silma-tts-v1-weights/vocab.txt /root/F5-TTS/data/finetuning_project_char
cp /root/silma-tts-v1-weights/finetune_cli.py /root/F5-TTS/src/f5_tts/train/finetune_cli.py
echo /root/silma-tts-v1-weights/config.yaml > /root/F5-TTS/src/f5_tts/configs/F5TTS_v1_Base.yaml



## open F5-TTS UI training pipeline 
f5-tts_finetune-gradio --port 7860 --host 0.0.0.0

## After preparing your data, go to "Train Model" tab -> "Path to the Pretrained Checkpoint" add enter the path to the silma-tts model weights file: /root/silma-tts-v1-weights/model.pt while leaving "Tokenizer File" empty

## For more information please follow the F5-TTS training guide below
## https://github.com/SWivid/F5-TTS/tree/main/src/f5_tts/train

Summary: you need to use the F5-TTS v1.1.7 training code, but use our config file, vocab, patched training script and our pretrained weights

Acknowledgements

This repo builds directly upon the excellent foundation laid by the F5-TTS project. The core architecture and the majority of the code is derived from their work. Our work introduces new pretrained weights and significant optimizations to the inference code.

Support

Unfortunately we don't have capacity to actively support this repo. We also believe it is best to consolidate resources and knowledge in a single location. This is why we encourage you to visit the official F5-TTS repository, which is highly active and supported by a fantastic community.

  • Please check if your question is already answered in the F5-TTS Issues or open a new issue
  • We monitor the F5-TTS Issues and will engage there if needed
  • If you have a question that you think only us can answer, then please open a community discussion on our HuggingFace repo

Citation

@article{silma-tts-v1,
      title={SILMA TTS: A Lightweight Open Bilingual Text to Speech Model}, 
      author={SILMA AI},
      year={2026},
}

License

  1. Code: MIT License
  2. Model Weights: Apache-2.0 License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

silma_tts-1.0.5.tar.gz (363.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

silma_tts-1.0.5-py3-none-any.whl (366.0 kB view details)

Uploaded Python 3

File details

Details for the file silma_tts-1.0.5.tar.gz.

File metadata

  • Download URL: silma_tts-1.0.5.tar.gz
  • Upload date:
  • Size: 363.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for silma_tts-1.0.5.tar.gz
Algorithm Hash digest
SHA256 ba3f4e4eee594402613416b1ebe2f13b5c6647400ddb9183b405b7d2b6b20dfd
MD5 604fb4a4fa33bbc1351394292ca90c28
BLAKE2b-256 dc5134dc6847c5bd80cd9b91111f7bfd268745d30f9b40a7c579d84e54b14e5d

See more details on using hashes here.

File details

Details for the file silma_tts-1.0.5-py3-none-any.whl.

File metadata

  • Download URL: silma_tts-1.0.5-py3-none-any.whl
  • Upload date:
  • Size: 366.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for silma_tts-1.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 8e771c6c80bfa8d891d4f18bc1d6fd065c92def0763d526ad7d1f5f5e34d9a9f
MD5 3fa4f6337a4a5b077dd834fd14c0d93e
BLAKE2b-256 c6df6bfd8b88fbfb4a07d451250bc4663df19e6aa3e10470807e3a6f7c44e1e0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page