QwenCleo-ASR — Egyptian Arabic & code-switching speech recognition, built on Qwen3-ASR.
Project description
🎙️ QwenCleo-ASR
The best open-source model for Egyptian Arabic & code-switching speech recognition
Built on Qwen3-ASR-1.7B, fine-tuned for Egyptian dialect and Arabic ↔ English code-switching.
QwenCleo — the name carries three meanings: Qwen, the powerful base model it is built on; Queen, signalling a model that reigns over its domain; and Cleo, for Cleopatra, the queen of Egypt — because this model is tailored for Egyptian Arabic. 👑🏺
QwenCleo-ASR is, to our knowledge, the best open-source ASR model for Egyptian Arabic and Arabic/English code-switching. It cuts the error rate of the strong Qwen3-ASR base roughly in half, and correctly keeps embedded English tech/loan words in Latin script (engineering, download, React, at least) instead of mangling them into broken Arabic.
- 🎯 Egyptian dialect — tuned on hundreds of hours of Egyptian podcast speech.
- 🔀 Code-switching — keeps English terms in
code-script, Arabic in Arabic. - 🥇 State-of-the-art (open) — beats Qwen3-ASR base, NVIDIA Nemotron, Cohere, and every Whisper variant on our Egyptian + CS test set.
- 📦
pip install qwencleo-asr— inference & chunked long-audio transcription in three lines. - ⚡ Real streaming — token-by-token via vLLM (
asr.stream(...)), plus a mic web demo. - 🚀 Serving — FastAPI server, Gradio demo, OpenAI-compatible vLLM endpoint.
📊 Results
WER / CER (%) on an Egyptian-Arabic + code-switching test set (3,699 utterances). Lower is better. All models scored with the same Egyptian-aware normalization.
| Model | Params | WER all | CER all | WER · AR | CER · AR | WER · CS | CER · CS |
|---|---|---|---|---|---|---|---|
| 🏆 QwenCleo-ASR | 1.7B | 19.85 | 10.64 | 19.08 | 10.43 | 20.29 | 10.92 |
| NVIDIA Nemotron-3.5 | 0.6B | 38.88 | 20.58 | 37.14 | 17.40 | 42.15 | 26.30 |
| Qwen3-ASR-1.7B (base) | 1.7B | 41.51 | 20.86 | 40.59 | 18.52 | 43.20 | 25.04 |
| Whisper Large-v3 Turbo (FT) | 0.81B | 50.83 | 22.86 | 48.37 | 18.42 | 55.08 | 37.84 |
| Cohere Transcribe 03-2026 | 2.0B | 53.78 | 39.63 | 48.57 | 34.12 | 63.76 | 49.66 |
| Whisper Large-v3 | 1.54B | 63.94 | 39.76 | 49.25 | 22.76 | 59.32 | 31.52 |
| Whisper Large-v2 | 1.54B | 72.34 | 48.73 | 60.75 | 33.21 | 66.85 | 40.75 |
| Whisper Large-v3 Turbo | 0.81B | 73.83 | 46.86 | 59.37 | 29.42 | 66.08 | 37.84 |
| Whisper Medium | 0.76B | 80.46 | 53.19 | 74.77 | 41.76 | 74.15 | 44.90 |
| Whisper Small | 0.24B | 89.99 | 60.34 | 77.42 | 42.53 | 87.09 | 55.22 |
| Whisper Tiny | 0.04B | 124.68 | 89.42 | 116.02 | 77.74 | 110.67 | 74.57 |
🗣️ Sample outputs
Real transcriptions from the test set. Ground truth first; each model's output below it. Notice how QwenCleo keeps English terms in Latin script and Egyptian dialect intact, while the other models transliterate English into broken Arabic or drop words entirely.
🔀 Code-switching
✅ Ground truth طب وانتوا يعني ك
engineeringالمفروض ان بيكون مثلا الstaff engineerبيقعد مع الengineering managers
| Model | Output |
|---|---|
| 🏆 QwenCleo | طب وانتوا يعني كengineering المفروض ان بيكون مثلا الstaff engineer بيقعد مع الengineering managers ✅ |
| Qwen3-ASR (base) | وأنتوا يعني كإنجنييرينج المفروض إنه بيكون مثلاً الأستاف إنجنيير بيعد مع ال engineering managers ❌ |
| Cohere Transcribe | وانتم كانجينيري المفروض ان يكون مثلا الاستفاده من الهدف ❌ (truncated) |
| Nemotron 3.5 ASR | وانتو يعني كإنجينير المفروض ان بيكون مثلاً الإستف إنجنير بيقعد مع الإنجنير مانيجرز ❌ |
✅ Ground truth يعني شوية حاجات كده
acrossكل الdomainsاوat leastيعني مع 4 5squadsفالموضوع صعب
| Model | Output |
|---|---|
| 🏆 QwenCleo | يعني شوية حاجات كده across كل الdomains او at least يعني مع 4 5 squads فالموضوع صعب ✅ |
| Qwen3-ASR (base) | يعني شوية حاجات كده أكرس كل الدومينز وأتلست يعني مع أربعة خمسة سكوات ❌ |
| Cohere Transcribe | يعني شويه حاجات كده اكروس كل الدومينز او اتليست مع اربع خمسه سكوات ❌ |
| Nemotron 3.5 ASR | يعني آه شوائد حاجات كده أكرس كل الدومين أو أتليست يعني مع أربع خمسة سكوات ❌ |
✅ Ground truth يقعد معاك حد مثلا من اللي هما ال
C levelاو مثلاengineer managerاو كده حسب الpositionبتاعه
| Model | Output |
|---|---|
| 🏆 QwenCleo | يقعد معاك حد مثلا من هما ال C level او مثلا engineer manager او كده حسب الposition بتاعك ✅ |
| Qwen3-ASR (base) | بيعرض معك حد مثلاً من هم C level أو مثلاً إنجنير مانAGER أو كذا حسب البوسيشن ❌ |
| Cohere Transcribe | يقعد معاك حد مثلا اللي هم السي لافل او انجنير مانجر او كده حسب الموضوع ❌ |
| Nemotron 3.5 ASR | بيقعد معاك حد مثلاً إن هم السي لفل أو مثلاً إنجينير مانجر أو كده حسب البوزيشن ❌ |
🇪🇬 Egyptian Arabic
✅ Ground truth طب دي كانت مثلا تاخد 84% 88%
| Model | Output |
|---|---|
| 🏆 QwenCleo | طب دي كانت مثلا تاخد 84% 88% ✅ |
| Qwen3-ASR (base) | طب دي كانت مثلاً تأخذ أربعة وثمانين في المية، ثمانية وثمانين في المية ❌ |
| Cohere Transcribe | طيب دي كانت مثلا تاخد اربعه وثمانين في الميه ثمانيه وثمانين في الميه ❌ |
| Nemotron 3.5 ASR | طيب دي كانت مثلاً تاخد أربعة وثمانين في المئة ثمانية وثمانين في المئة ❌ |
✅ Ground truth خد ال 4 في 4 او 4 ونص طلع دور 9
| Model | Output |
|---|---|
| 🏆 QwenCleo | خد ال 4 في 4 او 4 ونص طلع دور 9 ✅ |
| Qwen3-ASR (base) | خادل الأربعة فاربعة واربعة ونص تطلع دور تاسع ❌ |
| Cohere Transcribe | خد الاربعه فاربعه واربعه ونص طلع دور تسعه ❌ |
| Nemotron 3.5 ASR | خد الأربعة في أربعة وأربعة ونص طلع دور تسعة ❌ |
📦 Installation
Install the right torch first. A plain
pip installpulls the newest torch (built for the latest CUDA), which fails on older drivers with "NVIDIA driver too old". Install a torch build matching your driver before the package, then add QwenCleo with--no-depsso torch is never reinstalled.Pick the wheel index for your CUDA driver —
cu121(driver ≥ 12.1, e.g. CUDA 12.2),cu118(driver ≥ 11.8), orcpu. Check yours withnvidia-smi.
For inference & chunked transcription (PyPI)
conda create -n qwencleo-asr python=3.12 -y
conda activate qwencleo-asr
# 1) torch matching your driver (cu121 shown — change the index for yours)
pip install torch==2.5.1 torchaudio==2.5.1 \
--index-url https://download.pytorch.org/whl/cu121
# 2) QwenCleo without touching torch, then its remaining deps
pip install qwencleo-asr --no-deps
pip install "qwen-asr>=0.0.6" numpy soundfile huggingface_hub
That's all you need for the Python API and the qwencleo CLI.
For serving / Gradio / vLLM (clone the repo)
conda create -n qwencleo-asr python=3.12 -y
conda activate qwencleo-asr
# 1) torch matching your driver, first
pip install torch==2.5.1 torchaudio==2.5.1 \
--index-url https://download.pytorch.org/whl/cu121
# 2) the repo (without re-resolving torch) + serving deps
git clone https://github.com/MohammedAly22/qwencleo-asr.git
cd qwencleo-asr
pip install -e . --no-deps
pip install "qwen-asr>=0.0.6" numpy soundfile huggingface_hub
pip install -r requirements-serving.txt
Verify torch sees the GPU before running:
python -c "import torch; print(torch.__version__, torch.cuda.is_available())"
# -> 2.5.1+cu121 True
🚀 Usage
Python — basic transcription
from qwencleo_asr import QwenCleoASR
asr = QwenCleoASR() # loads mohammedaly22/QwenCleo-ASR
result = asr.transcribe("clip.wav") # language defaults to "Arabic"
print(result.text)
Batch, auto-detect language, and Egyptian normalization:
results = asr.transcribe(["a.wav", "b.wav"], language=None) # auto-detect
clean = asr.transcribe("clip.wav", normalize=True) # normalized text
Python — chunked transcription of long audio / mic
from qwencleo_asr import QwenCleoASR, stream_file
asr = QwenCleoASR()
for chunk in stream_file(asr, "long_podcast.wav", chunk_s=20, overlap_s=2):
print(f"[{chunk.start:.0f}-{chunk.end:.0f}s] {chunk.text}")
ℹ️ This is chunked transcription, not true streaming. It splits long/live audio into overlapping windows and transcribes each — convenient for captioning without a server, but latency is per-window. For genuine token-by-token streaming, use the vLLM path below.
Python — true streaming (vLLM)
QwenCleo inherits Qwen3-ASR's real token-by-token streaming via vLLM. Start a
server (see server/vllm_serve.md):
pip install "qwencleo-asr[vllm]" # vLLM nightly recommended — see docs
vllm serve mohammedaly22/QwenCleo-ASR
Then stream straight off the model object — deltas arrive as they're generated:
from qwencleo_asr import QwenCleoASR
asr = QwenCleoASR()
for delta in asr.stream("clip.wav"): # talks to the vLLM server
print(delta, end="", flush=True)
Or use the helpers directly:
from qwencleo_asr import stream_vllm, transcribe_vllm, VLLMOffline
for delta in stream_vllm("clip.wav", language="Arabic"):
print(delta, end="", flush=True)
print(transcribe_vllm("clip.wav")) # one-shot via the server
print(VLLMOffline().transcribe("clip.wav")) # in-process, no server
CLI
qwencleo transcribe clip.wav
qwencleo transcribe a.wav b.wav --language None --normalize
qwencleo stream long_podcast.wav --chunk-s 20 --overlap-s 2
🌐 Serving
FastAPI server
QWENCLEO_MODEL=mohammedaly22/QwenCleo-ASR \
uvicorn server.app:app --host 0.0.0.0 --port 8000
curl -X POST http://localhost:8000/v1/transcribe -F file=@clip.wav -F language=Arabic
Gradio demo
python app/gradio_app.py # http://localhost:7860 (mic + file upload)
vLLM — serving, streaming & OpenAI-compatible API
Full guide in server/vllm_serve.md. In short:
pip install "qwencleo-asr[vllm]" # vLLM nightly recommended (see docs)
vllm serve mohammedaly22/QwenCleo-ASR
OpenAI-compatible transcription:
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="EMPTY")
print(client.audio.transcriptions.create(
model="mohammedaly22/QwenCleo-ASR", file=open("clip.wav","rb").read()).text)
Streaming mic web demo
Live browser-mic transcription via the upstream Flask demo:
qwen-asr-demo-streaming \
--asr-model-path mohammedaly22/QwenCleo-ASR \
--host 0.0.0.0 --port 8000 --gpu-memory-utilization 0.9
# open http://<your-ip>:8000
🔗 Links
- 🤗 Model card:
mohammedaly22/QwenCleo-ASR - 📦 PyPI:
qwencleo-asr - 🧱 Base model:
Qwen/Qwen3-ASR-1.7B· Qwen3-ASR repo - Languages: Egyptian Arabic, Modern Standard Arabic, Arabic↔English code-switching
- Recommended
languagehint:"Arabic"(orNoneto auto-detect)
📜 License & citation
Apache-2.0, inheriting the Qwen3-ASR license terms.
@misc{qwencleo_asr_2026,
title = {QwenCleo-ASR: The Best Open-Source Egyptian Arabic and Code-Switching Speech Recognition Model},
author = {Mohammed Aly},
year = {2026},
howpublished = {\url{https://huggingface.co/mohammedaly22/QwenCleo-ASR}},
note = {Fine-tuned from Qwen3-ASR-1.7B}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file qwencleo_asr-0.1.0.tar.gz.
File metadata
- Download URL: qwencleo_asr-0.1.0.tar.gz
- Upload date:
- Size: 22.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ab13f7319296bbd0e1b002fdade4bad17a35d3113f36c5c623b1ac51a344eb3e
|
|
| MD5 |
6eefd0035b9beb74737c94447f554d29
|
|
| BLAKE2b-256 |
cb91a8bbb07a6e3aeaa720798240dc95014e99143f9674a72066c6971b7c2a1b
|
File details
Details for the file qwencleo_asr-0.1.0-py3-none-any.whl.
File metadata
- Download URL: qwencleo_asr-0.1.0-py3-none-any.whl
- Upload date:
- Size: 19.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1a5e94136d55312a204f8ee6dcd3e891f4fbb247eb88ef51f521c4b91d8ca6df
|
|
| MD5 |
6a8066cc686506431a9ac938ee8220f2
|
|
| BLAKE2b-256 |
2e8605e94d3701a96d1b71a1bd2e760ac018b99135d88fe7ebdc274f58bfecb7
|