zipformer: A faster and better encoder for ASR
Project description
Overview
zipformer is a speech encoder that achieves both high performance and efficiency. It is specifically optimized for speech recognition tasks and is the only model that outperforms Google's Conformer under fair comparison.
Features
- Efficient model architecture: UNet-style multi-scale encoder with module innovations (BiasNorm, Swoosh, Balancer, Whitener).
- New optimizer: ScaledAdam.
- State-of-the-art performance with 50% fewer FLOPs than Conformer.
- Supports CTC, Transducer, and AED modeling.
- CR-CTC: Consistency regularization for stronger CTC models.
Models
zipformer ASR models are available in xlarge, large, medium, and small variants, with both streaming and non-streaming versions. The table below provides download links. For more details, please refer to the documentation.
| Model | Parameters | ModelScope | Huggingface | Languages | Architectures |
|---|---|---|---|---|---|
| zipformer-xlarge | 300M | link | link | Chinese, English | CTC |
| zipformer-large | 150M | link | link | Chinese, English | CTC, Transducer |
| zipformer-large-streaming | 150M | link | link | Chinese, English | CTC, Transducer |
| zipformer-medium | 65M | link | link | Chinese, English | CTC, Transducer |
| zipformer-medium-streaming | 65M | link | link | Chinese, English | CTC, Transducer |
| zipformer-small | 25M | link | link | Chinese, English | CTC, Transducer |
| zipformer-small-streaming | 25M | link | link | Chinese, English | CTC, Transducer |
News
2026/06/22: Created standalone zipformer repository from icefall, and released xlarge, large, medium, and small Chinese/English models.
Installation
pip install zipformer
Usage
[!TIP] The examples below use the non-streaming medium model. For more models, please refer to the documentation.
Command Line
# Use jit scripted model
# Transducer
zipformer inference --hf-model pkufool/zipformer-medium --model-type jit --ctc 0 en.wav zh.wav
# CTC
zipformer inference --hf-model pkufool/zipformer-medium --model-type jit --ctc 1 en.wav zh.wav
# Use onnx model
# Transducer
zipformer inference --hf-model pkufool/zipformer-medium --model-type onnx --ctc 0 en.wav zh.wav
# CTC
zipformer inference --hf-model pkufool/zipformer-medium --model-type onnx --ctc 1 en.wav zh.wav
Python API
from zipformer import inference
# jit scripted model
result = inference([en.wav, zh.wav], hf_model='pkufool/zipformer-medium', model_type='jit', ctc=False)
result = inference([en.wav, zh.wav], hf_model='pkufool/zipformer-medium', model_type='jit', ctc=True)
# onnx model
result = inference([en.wav, zh.wav], hf_model='pkufool/zipformer-medium', model_type='onnx', ctc=False)
result = inference([en.wav, zh.wav], hf_model='pkufool/zipformer-medium', model_type='onnx', ctc=True)
# fp16 model
result = inference([en.wav, zh.wav], hf_model='pkufool/zipformer-medium', model_type='onnx', ctc=False, dtype='fp16')
result = inference([en.wav, zh.wav], hf_model='pkufool/zipformer-medium', model_type='onnx', ctc=True, dtype='fp16')
Documentation
For more information about model training, evaluation, and deployment, please refer to the documentation.
Discussion & Contact
For task-related issues, please open an issue on GitHub Issues.
You can also scan the QR code below to join our developer WeChat group or follow our WeChat official account.
| Developer Group Admin | WeChat Official Account |
|---|---|
Citation
@inproceedings{yao2024zipformer,
title={Zipformer: A faster and better encoder for automatic speech recognition},
author={Yao, Zengwei and Guo, Liyong and Yang, Xiaoyu and Kang, Wei and Kuang, Fangjun and Yang, Yifan and Jin, Zengrui and Lin, Long and Povey, Daniel},
booktitle={International Conference on Learning Representations},
volume={2024},
pages={44440--44455},
year={2024}
}
@inproceedings{yao2025cr,
title={Cr-ctc: Consistency regularization on ctc for improved speech recognition},
author={Yao, Zengwei and Kang, Wei and Yang, Xiaoyu and Kuang, Fangjun and Guo, Liyong and Zhu, Han and Jin, Zengrui and Li, Zhaoqing and Lin, Long and Povey, Daniel},
booktitle={International Conference on Learning Representations},
volume={2025},
pages={26850--26868},
year={2025}
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file zipformer-0.1.2.tar.gz.
File metadata
- Download URL: zipformer-0.1.2.tar.gz
- Upload date:
- Size: 137.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.10.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
01a549652d980d78aebf78d972a2d5bf60ecf020b198ef78c01069210fff7ce8
|
|
| MD5 |
36ce9491108a258ba10e71662787a67e
|
|
| BLAKE2b-256 |
e67a51e3abe30e7e91afa2067e69753317b48fbd2555cdb24a7aaf1a1389ada2
|
File details
Details for the file zipformer-0.1.2-py3-none-any.whl.
File metadata
- Download URL: zipformer-0.1.2-py3-none-any.whl
- Upload date:
- Size: 150.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.10.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
405b65a5a4eeddcbab888d4a970b8329493ae9ce122099c311fe3e5feb8e9bb5
|
|
| MD5 |
752c222aa6b86af82785b2d16aa6519c
|
|
| BLAKE2b-256 |
93262a7c35231bdafa2aceb2bd0e87e788009cc6ae4d9273023581d2e3f40011
|