Skip to main content

QwenCleo-ASR — Egyptian Arabic & code-switching speech recognition, built on Qwen3-ASR.

Project description

🎙️ QwenCleo-ASR

The best open-source model for Egyptian Arabic & code-switching speech recognition

Built on Qwen3-ASR-1.7B, fine-tuned for Egyptian dialect and Arabic ↔ English code-switching.

🤗 Model PyPI Base model License

QwenCleo-ASR

QwenCleo — the name carries three meanings: Qwen, the powerful base model it is built on; Queen, signalling a model that reigns over its domain; and Cleo, for Cleopatra, the queen of Egypt — because this model is tailored for Egyptian Arabic. 👑🏺

QwenCleo-ASR is, to our knowledge, the best open-source ASR model for Egyptian Arabic and Arabic/English code-switching. It cuts the error rate of the strong Qwen3-ASR base roughly in half, and correctly keeps embedded English tech/loan words in Latin script (engineering, download, React, at least) instead of mangling them into broken Arabic.

  • 🎯 Egyptian dialect — tuned on hundreds of hours of Egyptian podcast speech.
  • 🔀 Code-switching — keeps English terms in code-script, Arabic in Arabic.
  • 🥇 State-of-the-art (open) — beats Qwen3-ASR base, NVIDIA Nemotron, Cohere, and every Whisper variant on our Egyptian + CS test set.
  • 📦 pip install qwencleo-asr — inference & chunked long-audio transcription in three lines.
  • Real streaming — token-by-token via vLLM (asr.stream(...)), plus a mic web demo.
  • 🚀 Serving — FastAPI server, Gradio demo, OpenAI-compatible vLLM endpoint.

📊 Results

WER / CER (%) on an Egyptian-Arabic + code-switching test set (3,699 utterances). Lower is better. All models scored with the same Egyptian-aware normalization.

Benchmark overview

Model Params WER all CER all WER · AR CER · AR WER · CS CER · CS
🏆 QwenCleo-ASR 1.7B 19.85 10.64 19.08 10.43 20.29 10.92
NVIDIA Nemotron-3.5 0.6B 38.88 20.58 37.14 17.40 42.15 26.30
Qwen3-ASR-1.7B (base) 1.7B 41.51 20.86 40.59 18.52 43.20 25.04
Whisper Large-v3 Turbo (FT) 0.81B 50.83 22.86 48.37 18.42 55.08 37.84
Cohere Transcribe 03-2026 2.0B 53.78 39.63 48.57 34.12 63.76 49.66
Whisper Large-v3 1.54B 63.94 39.76 49.25 22.76 59.32 31.52
Whisper Large-v2 1.54B 72.34 48.73 60.75 33.21 66.85 40.75
Whisper Large-v3 Turbo 0.81B 73.83 46.86 59.37 29.42 66.08 37.84
Whisper Medium 0.76B 80.46 53.19 74.77 41.76 74.15 44.90
Whisper Small 0.24B 89.99 60.34 77.42 42.53 87.09 55.22
Whisper Tiny 0.04B 124.68 89.42 116.02 77.74 110.67 74.57

🗣️ Sample outputs

Real transcriptions from the test set. Ground truth first; each model's output below it. Notice how QwenCleo keeps English terms in Latin script and Egyptian dialect intact, while the other models transliterate English into broken Arabic or drop words entirely.

🔀 Code-switching

✅ Ground truth طب وانتوا يعني كengineering المفروض ان بيكون مثلا الstaff engineer بيقعد مع الengineering managers

Model Output
🏆 QwenCleo طب وانتوا يعني كengineering المفروض ان بيكون مثلا الstaff engineer بيقعد مع الengineering managers
Qwen3-ASR (base) وأنتوا يعني كإنجنييرينج المفروض إنه بيكون مثلاً الأستاف إنجنيير بيعد مع ال engineering managers ❌
Cohere Transcribe وانتم كانجينيري المفروض ان يكون مثلا الاستفاده من الهدف ❌ (truncated)
Nemotron 3.5 ASR وانتو يعني كإنجينير المفروض ان بيكون مثلاً الإستف إنجنير بيقعد مع الإنجنير مانيجرز ❌

✅ Ground truth يعني شوية حاجات كده across كل الdomains او at least يعني مع 4 5 squads فالموضوع صعب

Model Output
🏆 QwenCleo يعني شوية حاجات كده across كل الdomains او at least يعني مع 4 5 squads فالموضوع صعب ✅
Qwen3-ASR (base) يعني شوية حاجات كده أكرس كل الدومينز وأتلست يعني مع أربعة خمسة سكوات ❌
Cohere Transcribe يعني شويه حاجات كده اكروس كل الدومينز او اتليست مع اربع خمسه سكوات ❌
Nemotron 3.5 ASR يعني آه شوائد حاجات كده أكرس كل الدومين أو أتليست يعني مع أربع خمسة سكوات ❌

✅ Ground truth يقعد معاك حد مثلا من اللي هما ال C level او مثلا engineer manager او كده حسب الposition بتاعه

Model Output
🏆 QwenCleo يقعد معاك حد مثلا من هما ال C level او مثلا engineer manager او كده حسب الposition بتاعك ✅
Qwen3-ASR (base) بيعرض معك حد مثلاً من هم C level أو مثلاً إنجنير مانAGER أو كذا حسب البوسيشن ❌
Cohere Transcribe يقعد معاك حد مثلا اللي هم السي لافل او انجنير مانجر او كده حسب الموضوع ❌
Nemotron 3.5 ASR بيقعد معاك حد مثلاً إن هم السي لفل أو مثلاً إنجينير مانجر أو كده حسب البوزيشن ❌

🇪🇬 Egyptian Arabic

✅ Ground truth طب دي كانت مثلا تاخد 84% 88%

Model Output
🏆 QwenCleo طب دي كانت مثلا تاخد 84% 88% ✅
Qwen3-ASR (base) طب دي كانت مثلاً تأخذ أربعة وثمانين في المية، ثمانية وثمانين في المية ❌
Cohere Transcribe طيب دي كانت مثلا تاخد اربعه وثمانين في الميه ثمانيه وثمانين في الميه ❌
Nemotron 3.5 ASR طيب دي كانت مثلاً تاخد أربعة وثمانين في المئة ثمانية وثمانين في المئة ❌

✅ Ground truth خد ال 4 في 4 او 4 ونص طلع دور 9

Model Output
🏆 QwenCleo خد ال 4 في 4 او 4 ونص طلع دور 9 ✅
Qwen3-ASR (base) خادل الأربعة فاربعة واربعة ونص تطلع دور تاسع ❌
Cohere Transcribe خد الاربعه فاربعه واربعه ونص طلع دور تسعه ❌
Nemotron 3.5 ASR خد الأربعة في أربعة وأربعة ونص طلع دور تسعة ❌

📦 Installation

Install the right torch first. A plain pip install pulls the newest torch (built for the latest CUDA), which fails on older drivers with "NVIDIA driver too old". Install a torch build matching your driver before the package, then add QwenCleo with --no-deps so torch is never reinstalled.

Pick the wheel index for your CUDA driver — cu121 (driver ≥ 12.1, e.g. CUDA 12.2), cu118 (driver ≥ 11.8), or cpu. Check yours with nvidia-smi.

For inference & chunked transcription (PyPI)

conda create -n qwencleo-asr python=3.12 -y
conda activate qwencleo-asr

# 1) torch matching your driver (cu121 shown — change the index for yours)
pip install torch==2.5.1 torchaudio==2.5.1 \
  --index-url https://download.pytorch.org/whl/cu121

# 2) QwenCleo without touching torch, then its remaining deps
pip install qwencleo-asr --no-deps
pip install "qwen-asr>=0.0.6" numpy soundfile huggingface_hub

That's all you need for the Python API and the qwencleo CLI.

For serving / Gradio / vLLM (clone the repo)

conda create -n qwencleo-asr python=3.12 -y
conda activate qwencleo-asr

# 1) torch matching your driver, first
pip install torch==2.5.1 torchaudio==2.5.1 \
  --index-url https://download.pytorch.org/whl/cu121

# 2) the repo (without re-resolving torch) + serving deps
git clone https://github.com/MohammedAly22/qwencleo-asr.git
cd qwencleo-asr
pip install -e . --no-deps
pip install "qwen-asr>=0.0.6" numpy soundfile huggingface_hub
pip install -r requirements-serving.txt

Verify torch sees the GPU before running:

python -c "import torch; print(torch.__version__, torch.cuda.is_available())"
# -> 2.5.1+cu121 True

🚀 Usage

Python — basic transcription

from qwencleo_asr import QwenCleoASR

asr = QwenCleoASR()                       # loads mohammedaly22/QwenCleo-ASR
result = asr.transcribe("clip.wav")       # language defaults to "Arabic"
print(result.text)

Batch, auto-detect language, and Egyptian normalization:

results = asr.transcribe(["a.wav", "b.wav"], language=None)   # auto-detect
clean   = asr.transcribe("clip.wav", normalize=True)          # normalized text

Python — chunked transcription of long audio / mic

from qwencleo_asr import QwenCleoASR, stream_file

asr = QwenCleoASR()
for chunk in stream_file(asr, "long_podcast.wav", chunk_s=20, overlap_s=2):
    print(f"[{chunk.start:.0f}-{chunk.end:.0f}s] {chunk.text}")

ℹ️ This is chunked transcription, not true streaming. It splits long/live audio into overlapping windows and transcribes each — convenient for captioning without a server, but latency is per-window. For genuine token-by-token streaming, use the vLLM path below.

Python — true streaming (vLLM)

QwenCleo inherits Qwen3-ASR's real token-by-token streaming via vLLM. Start a server (see server/vllm_serve.md):

pip install "qwencleo-asr[vllm]"          # vLLM nightly recommended — see docs
vllm serve mohammedaly22/QwenCleo-ASR

Then stream straight off the model object — deltas arrive as they're generated:

from qwencleo_asr import QwenCleoASR

asr = QwenCleoASR()
for delta in asr.stream("clip.wav"):       # talks to the vLLM server
    print(delta, end="", flush=True)

Or use the helpers directly:

from qwencleo_asr import stream_vllm, transcribe_vllm, VLLMOffline

for delta in stream_vllm("clip.wav", language="Arabic"):
    print(delta, end="", flush=True)

print(transcribe_vllm("clip.wav"))         # one-shot via the server
print(VLLMOffline().transcribe("clip.wav"))  # in-process, no server

CLI

qwencleo transcribe clip.wav
qwencleo transcribe a.wav b.wav --language None --normalize
qwencleo stream long_podcast.wav --chunk-s 20 --overlap-s 2

🌐 Serving

FastAPI server

QWENCLEO_MODEL=mohammedaly22/QwenCleo-ASR \
uvicorn server.app:app --host 0.0.0.0 --port 8000

curl -X POST http://localhost:8000/v1/transcribe -F file=@clip.wav -F language=Arabic

Gradio demo

python app/gradio_app.py        # http://localhost:7860  (mic + file upload)

vLLM — serving, streaming & OpenAI-compatible API

Full guide in server/vllm_serve.md. In short:

pip install "qwencleo-asr[vllm]"           # vLLM nightly recommended (see docs)
vllm serve mohammedaly22/QwenCleo-ASR

OpenAI-compatible transcription:

from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="EMPTY")
print(client.audio.transcriptions.create(
    model="mohammedaly22/QwenCleo-ASR", file=open("clip.wav","rb").read()).text)

Streaming mic web demo

Live browser-mic transcription via the upstream Flask demo:

qwen-asr-demo-streaming \
  --asr-model-path mohammedaly22/QwenCleo-ASR \
  --host 0.0.0.0 --port 8000 --gpu-memory-utilization 0.9
# open http://<your-ip>:8000

🔗 Links


📜 License & citation

Apache-2.0, inheriting the Qwen3-ASR license terms.

@misc{qwencleo_asr_2026,
  title  = {QwenCleo-ASR: The Best Open-Source Egyptian Arabic and Code-Switching Speech Recognition Model},
  author = {Mohammed Aly},
  year   = {2026},
  howpublished = {\url{https://huggingface.co/mohammedaly22/QwenCleo-ASR}},
  note   = {Fine-tuned from Qwen3-ASR-1.7B}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

qwencleo_asr-0.1.0.tar.gz (22.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

qwencleo_asr-0.1.0-py3-none-any.whl (19.7 kB view details)

Uploaded Python 3

File details

Details for the file qwencleo_asr-0.1.0.tar.gz.

File metadata

  • Download URL: qwencleo_asr-0.1.0.tar.gz
  • Upload date:
  • Size: 22.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for qwencleo_asr-0.1.0.tar.gz
Algorithm Hash digest
SHA256 ab13f7319296bbd0e1b002fdade4bad17a35d3113f36c5c623b1ac51a344eb3e
MD5 6eefd0035b9beb74737c94447f554d29
BLAKE2b-256 cb91a8bbb07a6e3aeaa720798240dc95014e99143f9674a72066c6971b7c2a1b

See more details on using hashes here.

File details

Details for the file qwencleo_asr-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: qwencleo_asr-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 19.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for qwencleo_asr-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1a5e94136d55312a204f8ee6dcd3e891f4fbb247eb88ef51f521c4b91d8ca6df
MD5 6a8066cc686506431a9ac938ee8220f2
BLAKE2b-256 2e8605e94d3701a96d1b71a1bd2e760ac018b99135d88fe7ebdc274f58bfecb7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page