Skip to main content

Industrial-grade speech recognition: 170x realtime, 50+ languages, speaker diarization, emotion detection.

Project description

(简体中文|English|日本語|한국어)

FunASR

Industrial speech recognition. 170x faster than Whisper. 50+ languages.
Speaker diarization · Emotion detection · Streaming · One API call

PyPI Stars Downloads Docs

modelscope%2FFunASR | Trendshift

Quick Start · Colab · Benchmark · Model selection · Migration guide · Use cases · Deployment matrix · Models · Agent Integration · Docs · Contribute


Quick Start

Open In Colab

No local setup? Open the Colab quickstart to transcribe a public sample or upload your own audio in a browser.

pip install torch torchaudio
pip install funasr
from funasr import AutoModel

model = AutoModel(model="iic/SenseVoiceSmall", vad_model="fsmn-vad", spk_model="cam++", device="cuda")
result = model.generate(input="meeting.wav")

Output — structured text with speaker labels, timestamps, and punctuation:

[00:00.4 → 00:03.8] Speaker 0: Let's discuss the Q3 plan.
[00:04.2 → 00:07.1] Speaker 1: Sounds good. I have three points.
[00:07.5 → 00:12.3] Speaker 0: Go ahead. We have 30 minutes.

That's it. One model, one call — VAD segmentation, speech recognition, punctuation, speaker diarization all happen automatically.

Deploy as API server: funasr-server --device cuda → OpenAI-compatible endpoint at localhost:8000

Use with AI agents: MCP Server for Claude/Cursor · OpenAI API for LangChain/Dify/AutoGen

Why FunASR?

FunASR Whisper Cloud APIs
Speed 170x realtime 13x realtime ~1x realtime
Speaker ID ✅ Built-in ❌ Needs pyannote ✅ Extra cost
Emotion ✅ Happy/Sad/Angry
Languages 50+ 57 Varies
Streaming ✅ WebSocket
vLLM Acceleration ✅ 2-3x faster N/A
Self-hosted ✅ MIT license ✅ MIT license ❌ Cloud only
Cost Free Free $0.006/min+
CPU viable ✅ 17x realtime ❌ Too slow N/A

Trying FunASR for the first time? Use the Colab quickstart before setting up a local environment. Choosing a first model? Start with the model selection guide. Planning a switch from Whisper or a cloud ASR provider? Use the migration guide and benchmark example to test representative audio, map features, and roll out safely.


Benchmark

184 long-form audio files (192 min). Full report →

Model GPU Speed CPU Speed vs Whisper-large-v3
SenseVoice-Small 170x realtime 17x realtime 🚀 13x faster
Paraformer-Large 120x realtime 15x realtime 🚀 9x faster
Whisper-large-v3-turbo 46x realtime 3.4x faster
Fun-ASR-Nano 17x realtime 3.6x realtime 1.3x faster
Whisper-large-v3 13x realtime baseline

Key takeaway: FunASR models run on CPU faster than Whisper runs on GPU.


What's new

  • 2026/05/24: vLLM Inference Engine — 2-3x faster LLM decoding for Fun-ASR-Nano. Streaming WebSocket service with VAD + Speaker Diarization. Guide →
  • 2026/05/24: Dynamic VAD — adaptive silence threshold (default on). Short sentences stay intact, long segments get auto-split. Details →
  • 2026/05/24: v1.3.3funasr-server CLI, OpenAI-compatible API, MCP Server for AI agents. pip install --upgrade funasr
  • 2026/05/20: Added Qwen3-ASR (0.6B/1.7B) — 52 languages, auto detection. usage
  • 2026/05/20: Added GLM-ASR-Nano (1.5B) — 17 languages, dialect support. usage
  • 2026/05/19: Fun-ASR-Nano and SenseVoice now support speaker diarization.
  • 2025/12/15: Fun-ASR-Nano-2512 — 31 languages, tens of millions of hours training.
Older
  • 2024/10/10: Whisper-large-v3-turbo support added.
  • 2024/07/04: SenseVoice — ASR + emotion + audio events.
  • 2024/01/30: FunASR 1.0 released.

Installation

pip install funasr
From source / Requirements
git clone https://github.com/modelscope/FunASR.git && cd FunASR
pip install -e ./

Requirements: Python ≥ 3.8. Install PyTorch + torchaudio first (pytorch.org), then pip install funasr.


Model Zoo

Model Task Languages Params Links
Fun-ASR-Nano ASR + timestamps 31 languages 800M 🤗
SenseVoiceSmall ASR + emotion + events zh/en/ja/ko/yue 234M 🤗
Paraformer-zh ASR + timestamps zh/en 220M 🤗
Paraformer-zh-streaming Streaming ASR zh/en 220M 🤗
Qwen3-ASR ASR, 52 languages multilingual 1.7B usage
GLM-ASR-Nano ASR, 17 languages multilingual 1.5B usage
Whisper-large-v3 ASR + translation multilingual 1550M usage
Whisper-large-v3-turbo ASR + translation multilingual 809M usage
ct-punc Punctuation zh/en 290M 🤗
fsmn-vad VAD zh/en 0.4M 🤗
cam++ Speaker diarization 7.2M 🤗
emotion2vec+large Emotion recognition 300M 🤗

Usage

Full examples with parameter docs: Tutorial →

from funasr import AutoModel

# Chinese production (VAD + ASR + punctuation + speaker)
model = AutoModel(model="paraformer-zh", vad_model="fsmn-vad", punc_model="ct-punc", spk_model="cam++", device="cuda")
result = model.generate(input="meeting.wav", hotword="关键词 20")

# 31 languages with timestamps
model = AutoModel(model="FunAudioLLM/Fun-ASR-Nano-2512", hub="hf", trust_remote_code=True,
                  vad_model="fsmn-vad", vad_kwargs={"max_single_segment_time": 30000}, device="cuda")
result = model.generate(input="audio.wav", batch_size=1)

# Streaming real-time
model = AutoModel(model="paraformer-zh-streaming", device="cuda")
result = model.generate(input="chunk.wav", cache={}, chunk_size=[0, 10, 5])

# Emotion recognition
model = AutoModel(model="emotion2vec_plus_large", device="cuda")
result = model.generate(input="audio.wav", granularity="utterance")

Deploy

# OpenAI-compatible API (recommended)
pip install torch torchaudio
pip install funasr vllm fastapi uvicorn python-multipart
funasr-server --device cuda
# → POST /v1/audio/transcriptions at localhost:8000

Verify it with a public sample:

curl -L https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/BAC009S0764W0121.wav -o sample.wav
curl http://localhost:8000/v1/audio/transcriptions \
  -F file=@sample.wav \
  -F model=sensevoice \
  -F response_format=verbose_json
# Docker streaming service
docker pull registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.12

OpenAI API example → · Gradio demo → · Client recipes → · JavaScript/TypeScript recipes → · Kubernetes template → · Workflow recipes → · Postman collection → · OpenAPI spec → · Security guide → · Deployment matrix → · Deployment docs → · Agent integration →


Community

📖 Documentation 🐛 Issues
💬 Discussions 🤗 HuggingFace
🤝 Contributing 📈 20k growth plan

Star History

Star History Chart

License

MIT License

Citations

@inproceedings{gao2023funasr,
  author={Zhifu Gao and others},
  title={FunASR: A Fundamental End-to-End Speech Recognition Toolkit},
  booktitle={INTERSPEECH},
  year={2023}
}

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

funasr-1.3.8.tar.gz (46.6 MB view details)

Uploaded Source

File details

Details for the file funasr-1.3.8.tar.gz.

File metadata

  • Download URL: funasr-1.3.8.tar.gz
  • Upload date:
  • Size: 46.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for funasr-1.3.8.tar.gz
Algorithm Hash digest
SHA256 86e95078e4ff4697b4a4bdc20117402e6b93a279f8dad126b35d21785334b721
MD5 7fede5d12d602c6de00451bd0ac4b0c2
BLAKE2b-256 fc7e99ce5825e217b9db98059b0f2d868017f979077f462b2acb500960de432a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page