9jaLingo TTS-2: Text-to-Speech for Nigerian Languages โ English (Nigerian Accent), Hausa, Igbo, Yoruba, Pidgin with Voice Cloning
Project description
9jaLingo TTS-2 is a neural text-to-speech engine built for Nigerian languages. It uses causal language models with advanced architectural innovations to generate natural-sounding speech with voice cloning capabilities.
Supported Languages
| Language | Tag |
|---|---|
| ๐ณ๐ฌ Nigerian Accented English | en_NG |
| ๐ณ๐ฌ Hausa | ha |
| ๐ณ๐ฌ Igbo | ig |
| ๐ณ๐ฌ Yoruba | yo |
| ๐ณ๐ฌ Pidgin | pcm |
Features
- 5 Nigerian Languages โ English (Nigerian Accent), Hausa, Igbo, Yoruba, and Pidgin
- Voice Cloning โ Clone any voice from a short reference audio sample
- Speaker Embeddings โ True voice control through learned speaker representations
- Learnable RoPE Theta โ Per-layer frequency scaling for better position encoding
- Frame-Level Position Encoding โ Precise temporal control with configurable audio frame positioning
- Language Tag Support โ Multi-language support through language identifiers
- Extended Generation โ Up to 40 seconds of continuous high-quality audio
- Flexible Sampling โ Temperature, top-p, and repetition penalty at generation time
Installation
pip install naijalingo-tts-2
pip install -U "transformers==4.56.0"
Quick Start
from naijalingo_tts_2 import NaijaLingoTTS
# Initialize model
model = NaijaLingoTTS('9jalingo/your-model-name')
# Generate speech
audio, text = model("Bawo ni, kilode?", language_tag="yo")
# Save to file
model.save_audio(audio, "output.wav")
Three lines for high-quality Nigerian language TTS! ๐
Voice Cloning
9jaLingo TTS-2 supports voice cloning โ extract a speaker's voice from a reference audio and generate speech in that voice.
from naijalingo_tts_2 import NaijaLingoTTS, SpeakerEmbedder
# Initialize
model = NaijaLingoTTS('9jalingo/your-model-name')
embedder = SpeakerEmbedder()
# Extract speaker embedding from reference audio
speaker_embedding = embedder.embed_audio_file("reference_voice.wav") # [1, 128]
# Generate speech with that voice
audio, text = model(
"Na so the matter be, my broda.",
language_tag="pcm",
speaker_emb=speaker_embedding
)
model.save_audio(audio, "cloned_voice.wav")
How Speaker Embeddings Work
The speaker embedder uses a WavLM-based model trained to extract speaker characteristics:
- Input: Audio at any sample rate (3-30 seconds recommended, automatically resampled to 16kHz)
- Processing: MVN normalization โ WavLM encoder โ Stats pooling โ Projection โ L2 normalization
- Output: 128-dim L2-normalized speaker embedding ready for TTS
from naijalingo_tts_2 import SpeakerEmbedder
import torch
embedder = SpeakerEmbedder()
# From audio file
embedding = embedder.embed_audio_file("voice.wav")
# From numpy array
import numpy as np
audio_array = np.random.randn(16000 * 5)
embedding = embedder.embed_audio(audio_array, sample_rate=16000)
# Save for later
torch.save(embedding, "my_voice.pt")
# Use saved embedding
audio, text = model("Hello!", speaker_emb="my_voice.pt")
Pro tip: Use 10-20 seconds of clean reference audio for best results. Audio at any sample rate is supported (automatic resampling).
Language Tag Support
from naijalingo_tts_2 import NaijaLingoTTS
model = NaijaLingoTTS('9jalingo/your-multilingual-model')
# Check available tags
print(f"Status: {model.status}")
model.show_language_tags()
# Generate with specific language
audio, text = model("Sannu da zuwa!", language_tag="ha") # Hausa
audio, text = model("Kedu ka imere?", language_tag="ig") # Igbo
audio, text = model("Bawo ni o?", language_tag="yo") # Yoruba
audio, text = model("How far, my guy?", language_tag="pcm") # Pidgin
audio, text = model("Good morning everyone.", language_tag="en_NG") # Nigerian English
Controlling Generation
model = NaijaLingoTTS(
'9jalingo/your-model-name',
max_new_tokens=3000,
suppress_logs=True,
show_info=True,
)
audio, text = model(
"Your text here",
temperature=0.7, # Lower = more deterministic
top_p=0.9, # Nucleus sampling threshold
repetition_penalty=1.2, # Penalize repetition
speaker_emb=speaker_emb, # Optional: voice cloning
language_tag="yo" # Optional: language tag
)
Model Info Banner
When initialized, the model displays helpful information:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ
โ 9 j a L i n g o TTS-2 โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
๐ฃ๏ธ Nigerian Language Text-to-Speech Engine
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Model: 9jalingo/your-model-name
Device: GPU (CUDA)
Mode: Available language tags (5 language tags)
Tags: en_NG, ha, ig, yo, pcm
Configuration:
โข Sample Rate: 22050 Hz
โข Max Tokens: 3000
โข Speaker Embedding Dim: 128
โข Learnable RoPE: Enabled
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Supported: English (Nigerian) | Hausa | Igbo | Yoruba | Pidgin
Voice Cloning: Enabled ๐๏ธ
Ready to generate speech! ๐ต
Playing Audio in Jupyter
from naijalingo_tts_2 import NaijaLingoTTS
from IPython.display import Audio as aplay
model = NaijaLingoTTS('9jalingo/your-model-name')
audio, text = model("E kaabo!", language_tag="yo")
aplay(audio, rate=model.sample_rate)
API Reference
NaijaLingoTTS(model_name, **kwargs)
Main TTS interface.
Parameters:
model_name(str): HuggingFace model ID or local pathmax_new_tokens(int): Max generation length (default: 3000)device_map(str): Device mapping (default: "auto")suppress_logs(bool): Suppress library logs (default: True)show_info(bool): Display model info banner (default: True)
Methods:
model(text, language_tag=None, speaker_emb=None, temperature=1.0, top_p=0.95, repetition_penalty=1.1)โ(audio, text)model.generate(...)โ Same as__call__model.save_audio(audio, path)โ Save audio to filemodel.show_model_info()โ Display model bannermodel.show_language_tags()โ Display available language tagsmodel.load_speaker_embedding(path)โ Load speaker embedding from.ptfile
SpeakerEmbedder(model_name, device, max_duration_sec)
Extract speaker embeddings from audio.
Parameters:
model_name(str): HuggingFace model ID (default:"nineninesix/speaker-emb-tbr")device(str):"cuda"or"cpu"(default: auto-detect)max_duration_sec(float): Max audio length in seconds (default: 30.0)
Methods:
embedder.embed_audio(audio, sample_rate=16000)โ[1, 128]tensorembedder.embed_audio_file(path)โ[1, 128]tensor
Convenience Function
from naijalingo_tts_2 import compute_speaker_embedding
embedding = compute_speaker_embedding("speaker.wav")
Complete Example: Voice Cloning Pipeline
from naijalingo_tts_2 import NaijaLingoTTS, SpeakerEmbedder
import torch
# 1. Initialize
tts = NaijaLingoTTS('9jalingo/your-model-name')
embedder = SpeakerEmbedder()
# 2. Extract speaker embedding
speaker_emb = embedder.embed_audio_file("reference_speaker.wav")
torch.save(speaker_emb, "my_voice.pt")
# 3. Generate in multiple languages with cloned voice
languages = {
"en_NG": "Good morning, how are you doing today?",
"ha": "Ina kwana, yaya dai?",
"ig": "แปคtแปฅtแปฅ แปma, kedu ka แป mere?",
"yo": "E kaaro, bawo ni o se wa?",
"pcm": "Good morning o, how body?",
}
for lang, text in languages.items():
audio, _ = tts(text, language_tag=lang, speaker_emb=speaker_emb)
tts.save_audio(audio, f"output_{lang}.wav")
print(f"โ
Generated {lang}: {text}")
Voice Cloning Best Practices
Reference Audio Quality:
- โ Clean recordings without background noise
- โ Proper audio levels (not too quiet, not clipping)
- โ 10-20 seconds of clear speech
- โ Any sample rate (automatic resampling to 16kHz)
- โ Avoid noisy, compressed, or low-quality recordings
Better Speaker Representation:
# Average multiple samples for more robust embedding
embeddings = [embedder.embed_audio_file(f) for f in sample_files]
averaged_embedding = torch.stack(embeddings).mean(dim=0)
Architecture
Two-Stage Pipeline:
- Text โ Audio Tokens: Modified LFM2 causal LM generates discrete audio tokens
- Audio Tokens โ Waveform: NVIDIA NeMo NanoCodec decodes tokens to 22kHz audio
Key Innovations:
- Learnable RoPE โ Per-layer frequency scaling for better positional encoding
- Frame-Level Positions โ Audio tokens grouped in frames of 4 with shared positions
- Speaker Embeddings โ 128-dim continuous representations for zero-shot voice cloning
- Language Tags โ Accent and language control via prefix identifiers
Requirements
- Python 3.10+
- CUDA-capable GPU (recommended)
- PyTorch 2.0+
- Transformers 4.56.0+
- NeMo Toolkit
Performance
| Setup | ~10s of audio |
|---|---|
| GPU (CUDA) | 2-5 seconds |
| CPU | 20-60 seconds |
| VRAM | 4-8 GB (bfloat16) |
Responsible Use
Prohibited activities include:
- Generating false or misleading information
- Impersonating individuals without consent
- Hate speech, harassment, or incitement of violence
- Malicious activities such as spamming, phishing, or fraud
By using this package, you agree to comply with all applicable laws.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Citation
@software{naijalingo_tts_2,
author = {9jaLingo},
title = {9jaLingo TTS-2: Text-to-Speech for Nigerian Languages},
year = {2026},
publisher = {PyPI},
howpublished = {\url{https://pypi.org/project/naijalingo-tts-2/}},
note = {Supports English (Nigerian), Hausa, Igbo, Yoruba, and Pidgin with voice cloning}
}
Made with โค๏ธ by 9jaLingo for Nigeria ๐ณ๐ฌ
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file naijalingo_tts_2-0.1.2.tar.gz.
File metadata
- Download URL: naijalingo_tts_2-0.1.2.tar.gz
- Upload date:
- Size: 28.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9c755dc034283a49555fa331591ed8456afcad55366962a55db3227e74d1fc92
|
|
| MD5 |
24b0854f55bc609ff58eb9ab1a01accd
|
|
| BLAKE2b-256 |
4ab230c3388d6ef1003b8c9d99cbcac1cb68d86d9cfbf2a6108c7d306efba59f
|
File details
Details for the file naijalingo_tts_2-0.1.2-py3-none-any.whl.
File metadata
- Download URL: naijalingo_tts_2-0.1.2-py3-none-any.whl
- Upload date:
- Size: 25.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
61817cd558911d6eb8839a3a08dea4ef9f332a05579f47db976946166004f39f
|
|
| MD5 |
58d3d58eea2aa9e5f9b626214b9c749c
|
|
| BLAKE2b-256 |
612b606b51ce3eb4cafe908ffe777e9e55c4a4465c892e71bb41247eb22728b5
|