funasr

Industrial-grade speech recognition: 170x realtime, 50+ languages, speaker diarization, emotion detection.

These details have not been verified by PyPI

Project links

Project description

Industrial speech recognition. 170x faster than Whisper. 50+ languages.
Speaker diarization · Emotion detection · Streaming · One API call

Quick Start · Colab · Benchmark · Model selection · Migration guide · Use cases · Deployment matrix · Models · Agent Integration · Docs · Contribute

Quick Start

No local setup? Open the Colab quickstart to transcribe a public sample or upload your own audio in a browser.

pip install torch torchaudio
pip install funasr

Flagship model — Fun-ASR-Nano (LLM-ASR, 31 languages; the default recommendation, needs a GPU):

from funasr import AutoModel

model = AutoModel(model="FunAudioLLM/Fun-ASR-Nano-2512", device="cuda")
result = model.generate(input="https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav")
print(result[0]["text"])
# 欢迎大家来体验达摩院推出的语音识别模型。

On CPU (or for multilingual + emotion in one pass), use SenseVoice — which also returns speaker diarization and timestamps:

from funasr import AutoModel
from funasr.utils.postprocess_utils import rich_transcription_postprocess

model = AutoModel(model="iic/SenseVoiceSmall", vad_model="fsmn-vad", spk_model="cam++", device="cuda")  # use device="cpu" if you don't have a GPU
result = model.generate(
    input="https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav",
    batch_size_s=300,
)

# One call returns VAD segments with speaker id + timestamps — render them however you like:
for seg in result[0]["sentence_info"]:
    print(f"[{seg['start']/1000:.1f}s] Speaker {seg['spk']}: {rich_transcription_postprocess(seg['sentence'])}")

Output — structured text with speaker labels, timestamps, and punctuation:

[0.6s] Speaker 0: 欢迎大家来体验达摩院推出的语音识别模型

That's it. One model, one call — VAD segmentation, speech recognition, punctuation, speaker diarization all happen automatically.

Scale & deploy the flagship

At scale, accelerate Fun-ASR-Nano with vLLM (batch processing):

from funasr.auto.auto_model_vllm import AutoModelVLLM

model = AutoModelVLLM(model="FunAudioLLM/Fun-ASR-Nano-2512", tensor_parallel_size=1)
results = model.generate(["audio1.wav", "audio2.wav"], language="auto")

Deploy as API server: funasr-server --device cuda → OpenAI-compatible endpoint at localhost:8000

Use with AI agents: MCP Server for Claude/Cursor · OpenAI API for LangChain/Dify/AutoGen

Why FunASR?

	FunASR	Whisper	Cloud APIs
Speed	170x realtime	13x realtime	~1x realtime
Speaker ID	✅ Built-in	❌ Needs pyannote	✅ Extra cost
Emotion	✅ Happy/Sad/Angry	❌	❌
Languages	50+	57	Varies
Streaming	✅ WebSocket	❌	✅
vLLM Acceleration	✅ up to 16x faster	❌	N/A
Self-hosted	✅ MIT license	✅ MIT license	❌ Cloud only
Cost	Free	Free	$0.006/min+
CPU viable	✅ 17x realtime	❌ Too slow	N/A

Trying FunASR for the first time? Use the Colab quickstart before setting up a local environment. Choosing a first model? Start with the model selection guide. Planning a switch from Whisper or a cloud ASR provider? Use the migration guide and benchmark example to test representative audio, map features, and roll out safely.

Benchmark

184 long-form audio files (192 min). Full report →

Model	Chinese CER ↓	GPU Speed	CPU Speed	vs Whisper-large-v3
Fun-ASR-Nano (vLLM)	8.20%	340x realtime	—	🚀 26x faster
SenseVoice-Small	7.81%	170x realtime	17x realtime	🚀 13x faster
Paraformer-Large	10.18%	120x realtime	15x realtime	🚀 9x faster
Whisper-large-v3-turbo	21.71%	46x realtime	❌	3.4x faster
Whisper-large-v3	20.02%	13x realtime	❌	baseline

Key takeaway: FunASR models run on CPU faster than Whisper runs on GPU.

What's new

2026/06/20: llama.cpp / GGUF runtime — run SenseVoice / Paraformer / Fun-ASR-Nano on CPU & edge as a single self-contained binary (a whisper.cpp-style alternative), built-in FSMN-VAD, no Python at runtime. Prebuilt binaries for Linux / macOS / Windows + q8 quantized models (~half the size, same accuracy). runtime/llama.cpp/ · Releases
2026/06/21: v1.3.12 on PyPI — rolling fixes (qwen3-asr language codes, glm_asr, vLLM repetition_penalty). pip install --upgrade funasr
2026/05/24: vLLM Inference Engine — 2-3x faster LLM decoding for Fun-ASR-Nano. Streaming WebSocket service with VAD + Speaker Diarization. Guide →
2026/05/24: Dynamic VAD — adaptive silence threshold (default on). Short sentences stay intact, long segments get auto-split. Details →
2026/05/24: v1.3.3 — funasr-server CLI, OpenAI-compatible API, MCP Server for AI agents. pip install --upgrade funasr
2026/05/20: Added Qwen3-ASR (0.6B/1.7B) — 52 languages, auto detection. usage
2026/05/20: Added GLM-ASR-Nano (1.5B) — 17 languages, dialect support. usage
2026/05/19: Fun-ASR-Nano and SenseVoice now support speaker diarization.
2025/12/15: Fun-ASR-Nano-2512 — 31 languages, tens of millions of hours training.

Older

2024/10/10: Whisper-large-v3-turbo support added.
2024/07/04: SenseVoice — ASR + emotion + audio events.
2024/01/30: FunASR 1.0 released.

Installation

pip install funasr

From source / Requirements

git clone https://github.com/modelscope/FunASR.git && cd FunASR
pip install -e ./

Requirements: Python ≥ 3.8. Install PyTorch + torchaudio first (pytorch.org), then pip install funasr.

Model Zoo

Model	Task	Languages	Params	Links
Fun-ASR-Nano	ASR + timestamps	31 languages	800M	⭐ 🤗
SenseVoiceSmall	ASR + emotion + events	zh/en/ja/ko/yue	234M	⭐ 🤗
Paraformer-zh	ASR + timestamps	zh/en	220M	⭐ 🤗
Paraformer-zh-streaming	Streaming ASR	zh/en	220M	⭐ 🤗
Qwen3-ASR	ASR, 52 languages	multilingual	1.7B	usage
GLM-ASR-Nano	ASR, 17 languages	multilingual	1.5B	usage
Whisper-large-v3	ASR + translation	multilingual	1550M	usage
Whisper-large-v3-turbo	ASR + translation	multilingual	809M	usage
ct-punc	Punctuation	zh/en	290M	⭐ 🤗
fsmn-vad	VAD	zh/en	0.4M	⭐ 🤗
cam++	Speaker diarization	—	7.2M	⭐ 🤗
emotion2vec+large	Emotion recognition	—	300M	⭐ 🤗

Usage

Full examples with parameter docs: Tutorial →

from funasr import AutoModel

# Chinese production (VAD + ASR + punctuation + speaker)
model = AutoModel(model="paraformer-zh", vad_model="fsmn-vad", punc_model="ct-punc", spk_model="cam++", device="cuda")
result = model.generate(input="https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav", hotword="关键词 20")

# 31 languages with timestamps
model = AutoModel(model="FunAudioLLM/Fun-ASR-Nano-2512",
                  vad_model="fsmn-vad", vad_kwargs={"max_single_segment_time": 30000}, device="cuda")
result = model.generate(input="audio.wav", batch_size=1)

# Streaming real-time (feed audio chunk by chunk)
import soundfile as sf
model = AutoModel(model="paraformer-zh-streaming", device="cuda")
audio, sr = sf.read("speech.wav", dtype="float32")   # 16 kHz mono
chunk_size = [0, 10, 5]                               # 600 ms chunks
chunk_stride = chunk_size[1] * 960
cache = {}
n_chunks = (len(audio) - 1) // chunk_stride + 1
for i in range(n_chunks):
    chunk = audio[i * chunk_stride : (i + 1) * chunk_stride]
    res = model.generate(input=chunk, cache=cache, is_final=(i == n_chunks - 1),
                         chunk_size=chunk_size, encoder_chunk_look_back=4, decoder_chunk_look_back=1)
    if res[0]["text"]:
        print(res[0]["text"], end="", flush=True)

# Emotion recognition
model = AutoModel(model="emotion2vec_plus_large", device="cuda")
result = model.generate(input="audio.wav", granularity="utterance")

CLI (Agent-Friendly)

# Transcribe audio (simplest)
funasr audio.wav

# JSON output (for AI agents)
funasr audio.wav --output-format json

# SRT subtitles
funasr audio.wav --output-format srt --output-dir ./subs

# Speaker diarization + timestamps
funasr audio.wav --spk --timestamps -f json

# Choose model and language
funasr audio.wav --model paraformer --language zh

# Batch transcribe
funasr *.wav --output-format srt --output-dir ./output

Available models: sensevoice (default), paraformer, paraformer-en, fun-asr-nano

Deploy

# OpenAI-compatible API (recommended)
pip install torch torchaudio
pip install funasr vllm fastapi uvicorn python-multipart
funasr-server --device cuda
# → POST /v1/audio/transcriptions at localhost:8000

Verify it with a public sample:

curl -L https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/BAC009S0764W0121.wav -o sample.wav
curl http://localhost:8000/v1/audio/transcriptions \
  -F file=@sample.wav \
  -F model=sensevoice \
  -F response_format=verbose_json

# Docker streaming service
docker pull registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.12

CPU / Edge — llama.cpp / GGUF (no GPU, no Python)

Run SenseVoice / Paraformer / Fun-ASR-Nano as a single self-contained binary on CPU and edge devices — this is to FunASR what whisper.cpp is to Whisper, but with ~3× lower CER than whisper.cpp on Chinese. Built-in FSMN-VAD, no Python at runtime.

# 1) Grab a prebuilt binary from Releases (Linux / macOS / Windows), then:
bash download-funasr-model.sh sensevoice ./gguf        # or: paraformer | nano
llama-funasr-sensevoice -m ./gguf/SenseVoiceSmall-f16.gguf --vad ./gguf/fsmn-vad.gguf -a audio.wav
# → 欢迎大家来体验达摩院推出的语音识别模型

Prebuilt binaries: Releases · Download & quickstart: funasr.com/llama-cpp · GGUF models: Hugging Face · Docs & benchmarks: runtime/llama.cpp/

OpenAI API example → · Gradio demo → · Client recipes → · JavaScript/TypeScript recipes → · Kubernetes template → · Workflow recipes → · Postman collection → · OpenAPI spec → · Security guide → · Deployment matrix → · Deployment docs → · Agent integration →

Community


📖 Documentation	🐛 Issues
💬 Discussions	🤗 HuggingFace
🤝 Contributing	🌐 funasr.com

Star History

License

MIT License

Citations

@inproceedings{gao2023funasr,
  author={Zhifu Gao and others},
  title={FunASR: A Fundamental End-to-End Speech Recognition Toolkit},
  booktitle={INTERSPEECH},
  year={2023}
}

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.3.14

Jun 23, 2026

1.3.13

Jun 22, 2026

1.3.12

Jun 21, 2026

1.3.11

Jun 20, 2026

1.3.10

Jun 17, 2026

1.3.9

May 29, 2026

1.3.8

May 29, 2026

1.3.7

May 27, 2026

1.3.6

May 27, 2026

1.3.5

May 26, 2026

1.3.4

May 26, 2026

1.3.3

May 23, 2026

1.3.2

May 23, 2026

1.3.1

Jan 26, 2026

1.3.0

Jan 4, 2026

1.2.9

Dec 15, 2025

1.2.8

Dec 15, 2025

1.2.7

Aug 15, 2025

1.2.6

Mar 11, 2025

1.2.4

Feb 13, 2025

1.2.3

Jan 24, 2025

1.2.2

Dec 25, 2024

1.2.0

Dec 12, 2024

1.1.18

Dec 12, 2024

1.1.17

Dec 11, 2024

1.1.16

Nov 28, 2024

1.1.14

Nov 1, 2024

1.1.13

Oct 29, 2024

1.1.12

Oct 12, 2024

1.1.11

Oct 10, 2024

1.1.9

Sep 30, 2024

1.1.8

Sep 25, 2024

1.1.6

Aug 20, 2024

1.1.5

Aug 12, 2024

1.1.4

Jul 26, 2024

1.1.3

Jul 22, 2024

1.1.2

Jul 16, 2024

1.1.1

Jul 16, 2024

1.1.0

Jul 5, 2024

1.0.30

Jul 1, 2024

1.0.29

Jul 1, 2024

1.0.28

Jun 20, 2024

1.0.27

May 15, 2024

1.0.26

May 8, 2024

1.0.25

Apr 23, 2024

1.0.24

Apr 18, 2024

1.0.23

Apr 10, 2024

1.0.22

Apr 8, 2024

1.0.21

Apr 8, 2024

1.0.20

Apr 2, 2024

1.0.19

Mar 25, 2024

1.0.18

Mar 24, 2024

1.0.17

Mar 15, 2024

1.0.16

Mar 14, 2024

1.0.15

Mar 13, 2024

1.0.14

Mar 5, 2024

1.0.12

Mar 4, 2024

1.0.11

Feb 29, 2024

1.0.10

Feb 22, 2024

1.0.9

Feb 21, 2024

1.0.8

Feb 21, 2024

1.0.7

Feb 21, 2024

1.0.6

Feb 19, 2024

1.0.5

Jan 31, 2024

1.0.4

Jan 30, 2024

1.0.3

Jan 25, 2024

1.0.2

Jan 24, 2024

1.0.0

Jan 22, 2024

0.8.8

Jan 13, 2024

0.8.7

Nov 28, 2023

0.8.6

Nov 27, 2023

0.8.4

Nov 9, 2023

0.8.3

Nov 8, 2023

0.8.2

Oct 25, 2023

0.8.1

Oct 19, 2023

0.8.0

Oct 10, 2023

0.7.9

Oct 10, 2023

0.7.8

Sep 18, 2023

0.7.7

Sep 14, 2023

0.7.6

Sep 7, 2023

0.7.5

Aug 28, 2023

0.7.4

Aug 15, 2023

0.7.3

Aug 11, 2023

0.7.2

Aug 8, 2023

0.7.1

Jul 24, 2023

0.7.0

Jul 14, 2023

0.6.9

Jul 6, 2023

0.6.7

Jun 29, 2023

0.6.6

Jun 28, 2023

0.6.5

Jun 26, 2023

0.6.4

Jun 26, 2023

0.6.3

Jun 26, 2023

0.6.2

Jun 19, 2023

0.6.1

Jun 13, 2023

0.6.0

Jun 12, 2023

0.5.8

Jun 6, 2023

0.5.6

May 24, 2023

0.5.5

May 22, 2023

0.5.4

May 19, 2023

0.5.3

May 18, 2023

0.5.2

May 18, 2023

0.5.1

May 11, 2023

0.5.0

May 11, 2023

0.4.8

May 8, 2023

0.4.7

May 8, 2023

0.4.6

May 7, 2023

0.4.4

Apr 27, 2023

0.4.3

Apr 21, 2023

0.4.2

Apr 21, 2023

0.4.1

Apr 14, 2023

0.3.1

Mar 24, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

funasr-1.3.14.tar.gz (750.8 kB view details)

Uploaded Jun 23, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

funasr-1.3.14-py3-none-any.whl (926.2 kB view details)

Uploaded Jun 23, 2026 Python 3

File details

Details for the file funasr-1.3.14.tar.gz.

File metadata

Download URL: funasr-1.3.14.tar.gz
Upload date: Jun 23, 2026
Size: 750.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for funasr-1.3.14.tar.gz
Algorithm	Hash digest
SHA256	`fed214b60300f13470749956df0de9a1c9c213e53ceccf22b2e6f70b5fff5dfb`
MD5	`d5d16d31e302a6d1a22ad3744acab702`
BLAKE2b-256	`b85046a5f1b4eb369943bb6177490a57e368a6b66075d39727575f07a5813973`

See more details on using hashes here.

File details

Details for the file funasr-1.3.14-py3-none-any.whl.

File metadata

Download URL: funasr-1.3.14-py3-none-any.whl
Upload date: Jun 23, 2026
Size: 926.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for funasr-1.3.14-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fd2d451a323ce0d1bda0566bc9ca4224b4429bf46d55882cf35ad482118173fd`
MD5	`9670c4d7a7ae2f4b7f3f1fc1023f8f39`
BLAKE2b-256	`cbf9cda21e7a12d12889774191267b0348379ed5ab8d894d13cd239acd4538dc`

See more details on using hashes here.

funasr 1.3.14

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Quick Start

Scale & deploy the flagship

Why FunASR?

Benchmark

What's new

Installation

Model Zoo

Usage

CLI (Agent-Friendly)

Deploy

CPU / Edge — llama.cpp / GGUF (no GPU, no Python)

Community

Star History

License

Citations

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes