Skip to main content

zipformer: A faster and better encoder for ASR

Project description

中文版本

zipformer

A faster and better encoder for automatic speech recognition

Overview

zipformer is a speech encoder that achieves both high performance and efficiency. It is specifically optimized for speech recognition tasks and is the only model that outperforms Google's Conformer under fair comparison.

Features

  • Efficient model architecture: UNet-style multi-scale encoder with module innovations (BiasNorm, Swoosh, Balancer, Whitener).
  • New optimizer: ScaledAdam.
  • State-of-the-art performance with 50% fewer FLOPs than Conformer.
  • Supports CTC, Transducer, and AED modeling.
  • CR-CTC: Consistency regularization for stronger CTC models.

Models

zipformer ASR models are available in xlarge, large, medium, and small variants, with both streaming and non-streaming versions. The table below provides download links. For more details, please refer to the documentation.

Model Parameters ModelScope Huggingface Languages Architectures
zipformer-xlarge 300M link link Chinese, English CTC
zipformer-large 150M link link Chinese, English CTC, Transducer
zipformer-large-streaming 150M link link Chinese, English CTC, Transducer
zipformer-medium 65M link link Chinese, English CTC, Transducer
zipformer-medium-streaming 65M link link Chinese, English CTC, Transducer
zipformer-small 25M link link Chinese, English CTC, Transducer
zipformer-small-streaming 25M link link Chinese, English CTC, Transducer

News

2026/06/22: Created standalone zipformer repository from icefall, and released xlarge, large, medium, and small Chinese/English models.

Installation

pip install zipformer

Usage

[!TIP] The examples below use the non-streaming medium model. For more models, please refer to the documentation.

Command Line

# Use jit scripted model
# Transducer
zipformer inference --hf-model pkufool/zipformer-medium --model-type jit --ctc 0 en.wav zh.wav

# CTC
zipformer inference --hf-model pkufool/zipformer-medium --model-type jit --ctc 1 en.wav zh.wav

# Use onnx model
# Transducer
zipformer inference --hf-model pkufool/zipformer-medium --model-type onnx --ctc 0 en.wav zh.wav

# CTC
zipformer inference --hf-model pkufool/zipformer-medium --model-type onnx --ctc 1 en.wav zh.wav

Python API

from zipformer import inference

# jit scripted model
result = inference([en.wav, zh.wav], hf_model='pkufool/zipformer-medium', model_type='jit', ctc=False)

result = inference([en.wav, zh.wav], hf_model='pkufool/zipformer-medium', model_type='jit', ctc=True)

# onnx model
result = inference([en.wav, zh.wav], hf_model='pkufool/zipformer-medium', model_type='onnx', ctc=False)

result = inference([en.wav, zh.wav], hf_model='pkufool/zipformer-medium', model_type='onnx', ctc=True)

# fp16 model
result = inference([en.wav, zh.wav], hf_model='pkufool/zipformer-medium', model_type='onnx', ctc=False, dtype='fp16')

result = inference([en.wav, zh.wav], hf_model='pkufool/zipformer-medium', model_type='onnx', ctc=True, dtype='fp16')

Documentation

For more information about model training, evaluation, and deployment, please refer to the documentation.

Discussion & Contact

For task-related issues, please open an issue on GitHub Issues.

You can also scan the QR code below to join our developer WeChat group or follow our WeChat official account.

Developer Group Admin WeChat Official Account
wechat wechat

Citation

@inproceedings{yao2024zipformer,
  title={Zipformer: A faster and better encoder for automatic speech recognition},
  author={Yao, Zengwei and Guo, Liyong and Yang, Xiaoyu and Kang, Wei and Kuang, Fangjun and Yang, Yifan and Jin, Zengrui and Lin, Long and Povey, Daniel},
  booktitle={International Conference on Learning Representations},
  volume={2024},
  pages={44440--44455},
  year={2024}
}

@inproceedings{yao2025cr,
  title={Cr-ctc: Consistency regularization on ctc for improved speech recognition},
  author={Yao, Zengwei and Kang, Wei and Yang, Xiaoyu and Kuang, Fangjun and Guo, Liyong and Zhu, Han and Jin, Zengrui and Li, Zhaoqing and Lin, Long and Povey, Daniel},
  booktitle={International Conference on Learning Representations},
  volume={2025},
  pages={26850--26868},
  year={2025}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zipformer-0.1.1.tar.gz (137.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

zipformer-0.1.1-py3-none-any.whl (150.3 kB view details)

Uploaded Python 3

File details

Details for the file zipformer-0.1.1.tar.gz.

File metadata

  • Download URL: zipformer-0.1.1.tar.gz
  • Upload date:
  • Size: 137.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.10.20

File hashes

Hashes for zipformer-0.1.1.tar.gz
Algorithm Hash digest
SHA256 5b7075d1a37032e91ebaaad2611b645a1be50b6d2f28aa055263526bf840744d
MD5 f1ff190dea7dc2aea5117078ead3741e
BLAKE2b-256 65d96435be4eab2d215e219bda86e904ed8bf14ce69c03014c5da838441bafbb

See more details on using hashes here.

File details

Details for the file zipformer-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: zipformer-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 150.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.10.20

File hashes

Hashes for zipformer-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 92dde4b7d40ddabcc844a8c5eac00c1090c5410be1c5a6c49e0ee9df324b6b6a
MD5 f645b2e3f9539b3917758e08f7669bb7
BLAKE2b-256 9f09811848c99ed185e17757a58c9bfa75ca86ca45509df8b70c00b85ed95b54

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page