QwenCleo-ASR — Egyptian Arabic & code-switching speech recognition, built on Qwen3-ASR.

These details have not been verified by PyPI

Project links

Project description

🎙️ QwenCleo-ASR

The best open-source model for Egyptian Arabic & code-switching speech recognition

Built on Qwen3-ASR-1.7B, fine-tuned for Egyptian dialect and Arabic ↔ English code-switching.

QwenCleo — the name carries three meanings: Qwen, the powerful base model it is built on; Queen, signalling a model that reigns over its domain; and Cleo, for Cleopatra, the queen of Egypt — because this model is tailored for Egyptian Arabic. 👑🏺

QwenCleo-ASR is, to our knowledge, the best open-source ASR model for Egyptian Arabic and Arabic/English code-switching. It cuts the error rate of the strong Qwen3-ASR base roughly in half, and correctly keeps embedded English tech/loan words in Latin script (engineering, download, React, at least) instead of mangling them into broken Arabic.

🎯 Egyptian dialect — tuned on hundreds of hours of Egyptian podcast speech.
🔀 Code-switching — keeps English terms in code-script, Arabic in Arabic.
🥇 State-of-the-art (open) — beats Qwen3-ASR base, NVIDIA Nemotron, Cohere, and every Whisper variant on our Egyptian + CS test set.
📦 pip install qwencleo-asr — inference & chunked long-audio transcription in three lines.
⚡ Real streaming — token-by-token via vLLM (asr.stream(...)), plus a mic web demo.
🚀 Serving — FastAPI server, Gradio demo, OpenAI-compatible vLLM endpoint.

📊 Results

WER / CER (%) on an Egyptian-Arabic + code-switching test set (3,699 utterances). Lower is better. All models scored with the same Egyptian-aware normalization.

Benchmark overview

Model	Params	WER all	CER all	WER · AR	CER · AR	WER · CS	CER · CS
🏆 QwenCleo-ASR	1.7B	19.85	10.64	19.08	10.43	20.29	10.92
NVIDIA Nemotron-3.5	0.6B	38.88	20.58	37.14	17.40	42.15	26.30
Qwen3-ASR-1.7B (base)	1.7B	41.51	20.86	40.59	18.52	43.20	25.04
Whisper Large-v3 Turbo (FT)	0.81B	50.83	22.86	48.37	18.42	55.08	37.84
Cohere Transcribe 03-2026	2.0B	53.78	39.63	48.57	34.12	63.76	49.66
Whisper Large-v3	1.54B	63.94	39.76	49.25	22.76	59.32	31.52
Whisper Large-v2	1.54B	72.34	48.73	60.75	33.21	66.85	40.75
Whisper Large-v3 Turbo	0.81B	73.83	46.86	59.37	29.42	66.08	37.84
Whisper Medium	0.76B	80.46	53.19	74.77	41.76	74.15	44.90
Whisper Small	0.24B	89.99	60.34	77.42	42.53	87.09	55.22
Whisper Tiny	0.04B	124.68	89.42	116.02	77.74	110.67	74.57

🗣️ Sample outputs

Real transcriptions from the test set. Ground truth first; each model's output below it. Notice how QwenCleo keeps English terms in Latin script and Egyptian dialect intact, while the other models transliterate English into broken Arabic or drop words entirely.

🔀 Code-switching

✅ Ground truth طب وانتوا يعني كengineering المفروض ان بيكون مثلا الstaff engineer بيقعد مع الengineering managers

Model	Output
🏆 QwenCleo	طب وانتوا يعني ك`engineering` المفروض ان بيكون مثلا ال`staff engineer` بيقعد مع ال`engineering managers` ✅
Qwen3-ASR (base)	وأنتوا يعني كإنجنييرينج المفروض إنه بيكون مثلاً الأستاف إنجنيير بيعد مع ال engineering managers ❌
Cohere Transcribe	وانتم كانجينيري المفروض ان يكون مثلا الاستفاده من الهدف ❌ (truncated)
Nemotron 3.5 ASR	وانتو يعني كإنجينير المفروض ان بيكون مثلاً الإستف إنجنير بيقعد مع الإنجنير مانيجرز ❌

✅ Ground truth يعني شوية حاجات كده across كل الdomains او at least يعني مع 4 5 squads فالموضوع صعب

Model	Output
🏆 QwenCleo	يعني شوية حاجات كده `across` كل ال`domains` او `at least` يعني مع 4 5 `squads` فالموضوع صعب ✅
Qwen3-ASR (base)	يعني شوية حاجات كده أكرس كل الدومينز وأتلست يعني مع أربعة خمسة سكوات ❌
Cohere Transcribe	يعني شويه حاجات كده اكروس كل الدومينز او اتليست مع اربع خمسه سكوات ❌
Nemotron 3.5 ASR	يعني آه شوائد حاجات كده أكرس كل الدومين أو أتليست يعني مع أربع خمسة سكوات ❌

✅ Ground truth يقعد معاك حد مثلا من اللي هما ال C level او مثلا engineer manager او كده حسب الposition بتاعه

Model	Output
🏆 QwenCleo	يقعد معاك حد مثلا من هما ال `C level` او مثلا `engineer manager` او كده حسب ال`position` بتاعك ✅
Qwen3-ASR (base)	بيعرض معك حد مثلاً من هم C level أو مثلاً إنجنير مانAGER أو كذا حسب البوسيشن ❌
Cohere Transcribe	يقعد معاك حد مثلا اللي هم السي لافل او انجنير مانجر او كده حسب الموضوع ❌
Nemotron 3.5 ASR	بيقعد معاك حد مثلاً إن هم السي لفل أو مثلاً إنجينير مانجر أو كده حسب البوزيشن ❌

🇪🇬 Egyptian Arabic

✅ Ground truth طب دي كانت مثلا تاخد 84% 88%

Model	Output
🏆 QwenCleo	طب دي كانت مثلا تاخد 84% 88% ✅
Qwen3-ASR (base)	طب دي كانت مثلاً تأخذ أربعة وثمانين في المية، ثمانية وثمانين في المية ❌
Cohere Transcribe	طيب دي كانت مثلا تاخد اربعه وثمانين في الميه ثمانيه وثمانين في الميه ❌
Nemotron 3.5 ASR	طيب دي كانت مثلاً تاخد أربعة وثمانين في المئة ثمانية وثمانين في المئة ❌

✅ Ground truth خد ال 4 في 4 او 4 ونص طلع دور 9

Model	Output
🏆 QwenCleo	خد ال 4 في 4 او 4 ونص طلع دور 9 ✅
Qwen3-ASR (base)	خادل الأربعة فاربعة واربعة ونص تطلع دور تاسع ❌
Cohere Transcribe	خد الاربعه فاربعه واربعه ونص طلع دور تسعه ❌
Nemotron 3.5 ASR	خد الأربعة في أربعة وأربعة ونص طلع دور تسعة ❌

📦 Installation

Install the right torch first. A plain pip install pulls the newest torch (built for the latest CUDA), which fails on older drivers with "NVIDIA driver too old". Install a torch build matching your driver before the package, then add QwenCleo with --no-deps so torch is never reinstalled.

Pick the wheel index for your CUDA driver — cu121 (driver ≥ 12.1, e.g. CUDA 12.2), cu118 (driver ≥ 11.8), or cpu. Check yours with nvidia-smi.

For inference & chunked transcription (PyPI)

conda create -n qwencleo-asr python=3.12 -y
conda activate qwencleo-asr

# 1) torch matching your driver (cu121 shown — change the index for yours)
pip install torch==2.5.1 torchaudio==2.5.1 \
  --index-url https://download.pytorch.org/whl/cu121

# 2) QwenCleo without touching torch, then its remaining deps
pip install qwencleo-asr --no-deps
pip install "qwen-asr>=0.0.6" numpy soundfile huggingface_hub

That's all you need for the Python API and the qwencleo CLI.

For serving / Gradio / vLLM (clone the repo)

conda create -n qwencleo-asr python=3.12 -y
conda activate qwencleo-asr

# 1) torch matching your driver, first
pip install torch==2.5.1 torchaudio==2.5.1 \
  --index-url https://download.pytorch.org/whl/cu121

# 2) the repo (without re-resolving torch) + serving deps
git clone https://github.com/MohammedAly22/qwencleo-asr.git
cd qwencleo-asr
pip install -e . --no-deps
pip install "qwen-asr>=0.0.6" numpy soundfile huggingface_hub
pip install -r requirements-serving.txt

Verify torch sees the GPU before running:

python -c "import torch; print(torch.__version__, torch.cuda.is_available())"
# -> 2.5.1+cu121 True

🚀 Usage

Python — basic transcription

from qwencleo_asr import QwenCleoASR

asr = QwenCleoASR()                       # loads mohammedaly22/QwenCleo-ASR
result = asr.transcribe("clip.wav")       # language defaults to "Arabic"
print(result.text)

Batch, auto-detect language, and Egyptian normalization:

results = asr.transcribe(["a.wav", "b.wav"], language=None)   # auto-detect
clean   = asr.transcribe("clip.wav", normalize=True)          # normalized text

Python — chunked transcription of long audio / mic

from qwencleo_asr import QwenCleoASR, stream_file

asr = QwenCleoASR()
for chunk in stream_file(asr, "long_podcast.wav", chunk_s=20, overlap_s=2):
    print(f"[{chunk.start:.0f}-{chunk.end:.0f}s] {chunk.text}")

ℹ️ This is chunked transcription, not true streaming. It splits long/live audio into overlapping windows and transcribes each — convenient for captioning without a server, but latency is per-window. For genuine token-by-token streaming, use the vLLM path below.

Python — true streaming (vLLM)

QwenCleo inherits Qwen3-ASR's real token-by-token streaming via vLLM. Start a server (see server/vllm_serve.md):

pip install "qwencleo-asr[vllm]"          # vLLM nightly recommended — see docs
vllm serve mohammedaly22/QwenCleo-ASR

Then stream straight off the model object — deltas arrive as they're generated:

from qwencleo_asr import QwenCleoASR

asr = QwenCleoASR()
for delta in asr.stream("clip.wav"):       # talks to the vLLM server
    print(delta, end="", flush=True)

Or use the helpers directly:

from qwencleo_asr import stream_vllm, transcribe_vllm, VLLMOffline

for delta in stream_vllm("clip.wav", language="Arabic"):
    print(delta, end="", flush=True)

print(transcribe_vllm("clip.wav"))         # one-shot via the server
print(VLLMOffline().transcribe("clip.wav"))  # in-process, no server

CLI

qwencleo transcribe clip.wav
qwencleo transcribe a.wav b.wav --language None --normalize
qwencleo stream long_podcast.wav --chunk-s 20 --overlap-s 2

🌐 Serving

FastAPI server

QWENCLEO_MODEL=mohammedaly22/QwenCleo-ASR \
uvicorn server.app:app --host 0.0.0.0 --port 8000

curl -X POST http://localhost:8000/v1/transcribe -F file=@clip.wav -F language=Arabic

Gradio demo

python app/gradio_app.py        # http://localhost:7860  (mic + file upload)

vLLM — serving, streaming & OpenAI-compatible API

Full guide in server/vllm_serve.md. In short:

pip install "qwencleo-asr[vllm]"           # vLLM nightly recommended (see docs)
vllm serve mohammedaly22/QwenCleo-ASR

OpenAI-compatible transcription:

from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="EMPTY")
print(client.audio.transcriptions.create(
    model="mohammedaly22/QwenCleo-ASR", file=open("clip.wav","rb").read()).text)

Streaming mic web demo

Live browser-mic transcription via the upstream Flask demo:

qwen-asr-demo-streaming \
  --asr-model-path mohammedaly22/QwenCleo-ASR \
  --host 0.0.0.0 --port 8000 --gpu-memory-utilization 0.9
# open http://<your-ip>:8000

🔗 Links

🤗 Model card: mohammedaly22/QwenCleo-ASR
📦 PyPI: qwencleo-asr
🧱 Base model: Qwen/Qwen3-ASR-1.7B · Qwen3-ASR repo
Languages: Egyptian Arabic, Modern Standard Arabic, Arabic↔English code-switching
Recommended language hint: "Arabic" (or None to auto-detect)

📜 License & citation

Apache-2.0, inheriting the Qwen3-ASR license terms.

@misc{qwencleo_asr_2026,
  title  = {QwenCleo-ASR: The Best Open-Source Egyptian Arabic and Code-Switching Speech Recognition Model},
  author = {Mohammed Aly},
  year   = {2026},
  howpublished = {\url{https://huggingface.co/mohammedaly22/QwenCleo-ASR}},
  note   = {Fine-tuned from Qwen3-ASR-1.7B}
}

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Jun 14, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

qwencleo_asr-0.1.0.tar.gz (22.5 kB view details)

Uploaded Jun 14, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

qwencleo_asr-0.1.0-py3-none-any.whl (19.7 kB view details)

Uploaded Jun 14, 2026 Python 3

File details

Details for the file qwencleo_asr-0.1.0.tar.gz.

File metadata

Download URL: qwencleo_asr-0.1.0.tar.gz
Upload date: Jun 14, 2026
Size: 22.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for qwencleo_asr-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`ab13f7319296bbd0e1b002fdade4bad17a35d3113f36c5c623b1ac51a344eb3e`
MD5	`6eefd0035b9beb74737c94447f554d29`
BLAKE2b-256	`cb91a8bbb07a6e3aeaa720798240dc95014e99143f9674a72066c6971b7c2a1b`

See more details on using hashes here.

File details

Details for the file qwencleo_asr-0.1.0-py3-none-any.whl.

File metadata

Download URL: qwencleo_asr-0.1.0-py3-none-any.whl
Upload date: Jun 14, 2026
Size: 19.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for qwencleo_asr-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1a5e94136d55312a204f8ee6dcd3e891f4fbb247eb88ef51f521c4b91d8ca6df`
MD5	`6a8066cc686506431a9ac938ee8220f2`
BLAKE2b-256	`2e8605e94d3701a96d1b71a1bd2e760ac018b99135d88fe7ebdc274f58bfecb7`

See more details on using hashes here.

qwencleo-asr 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

🎙️ QwenCleo-ASR

The best open-source model for Egyptian Arabic & code-switching speech recognition

📊 Results

🗣️ Sample outputs

🔀 Code-switching

🇪🇬 Egyptian Arabic

📦 Installation

For inference & chunked transcription (PyPI)

For serving / Gradio / vLLM (clone the repo)

🚀 Usage

Python — basic transcription

Python — chunked transcription of long audio / mic

Python — true streaming (vLLM)

CLI

🌐 Serving

FastAPI server

Gradio demo

vLLM — serving, streaming & OpenAI-compatible API

Streaming mic web demo

🔗 Links

📜 License & citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes