zipformer: A faster and better encoder for ASR

These details have not been verified by PyPI

Project description

zipformer

A faster and better encoder for automatic speech recognition

Overview

zipformer is a speech encoder that achieves both high performance and efficiency. It is specifically optimized for speech recognition tasks and is the only model that outperforms Google's Conformer under fair comparison.

Features

Efficient model architecture: UNet-style multi-scale encoder with module innovations (BiasNorm, Swoosh, Balancer, Whitener).
New optimizer: ScaledAdam.
State-of-the-art performance with 50% fewer FLOPs than Conformer.
Supports CTC, Transducer, and AED modeling.
CR-CTC: Consistency regularization for stronger CTC models.

Models

zipformer ASR models are available in xlarge, large, medium, and small variants, with both streaming and non-streaming versions. The table below provides download links. For more details, please refer to the documentation.

Model	Parameters	ModelScope	Huggingface	Languages	Architectures
zipformer-xlarge	300M	link	link	Chinese, English	CTC
zipformer-large	150M	link	link	Chinese, English	CTC, Transducer
zipformer-large-streaming	150M	link	link	Chinese, English	CTC, Transducer
zipformer-medium	65M	link	link	Chinese, English	CTC, Transducer
zipformer-medium-streaming	65M	link	link	Chinese, English	CTC, Transducer
zipformer-small	25M	link	link	Chinese, English	CTC, Transducer
zipformer-small-streaming	25M	link	link	Chinese, English	CTC, Transducer

News

2026/06/22: Created standalone zipformer repository from icefall, and released xlarge, large, medium, and small Chinese/English models.

Installation

pip install zipformer

Usage

[!TIP] The examples below use the non-streaming medium model. For more models, please refer to the documentation.

Command Line

# Use jit scripted model
# Transducer
zipformer inference --hf-model pkufool/zipformer-medium --model-type jit --ctc 0 en.wav zh.wav

# CTC
zipformer inference --hf-model pkufool/zipformer-medium --model-type jit --ctc 1 en.wav zh.wav

# Use onnx model
# Transducer
zipformer inference --hf-model pkufool/zipformer-medium --model-type onnx --ctc 0 en.wav zh.wav

# CTC
zipformer inference --hf-model pkufool/zipformer-medium --model-type onnx --ctc 1 en.wav zh.wav

Python API

from zipformer import inference

# jit scripted model
result = inference([en.wav, zh.wav], hf_model='pkufool/zipformer-medium', model_type='jit', ctc=False)

result = inference([en.wav, zh.wav], hf_model='pkufool/zipformer-medium', model_type='jit', ctc=True)

# onnx model
result = inference([en.wav, zh.wav], hf_model='pkufool/zipformer-medium', model_type='onnx', ctc=False)

result = inference([en.wav, zh.wav], hf_model='pkufool/zipformer-medium', model_type='onnx', ctc=True)

# fp16 model
result = inference([en.wav, zh.wav], hf_model='pkufool/zipformer-medium', model_type='onnx', ctc=False, dtype='fp16')

result = inference([en.wav, zh.wav], hf_model='pkufool/zipformer-medium', model_type='onnx', ctc=True, dtype='fp16')

Documentation

For more information about model training, evaluation, and deployment, please refer to the documentation.

Discussion & Contact

For task-related issues, please open an issue on GitHub Issues.

You can also scan the QR code below to join our developer WeChat group or follow our WeChat official account.

Developer Group Admin	WeChat Official Account

Citation

@inproceedings{yao2024zipformer,
  title={Zipformer: A faster and better encoder for automatic speech recognition},
  author={Yao, Zengwei and Guo, Liyong and Yang, Xiaoyu and Kang, Wei and Kuang, Fangjun and Yang, Yifan and Jin, Zengrui and Lin, Long and Povey, Daniel},
  booktitle={International Conference on Learning Representations},
  volume={2024},
  pages={44440--44455},
  year={2024}
}

@inproceedings{yao2025cr,
  title={Cr-ctc: Consistency regularization on ctc for improved speech recognition},
  author={Yao, Zengwei and Kang, Wei and Yang, Xiaoyu and Kuang, Fangjun and Guo, Liyong and Zhu, Han and Jin, Zengrui and Li, Zhaoqing and Lin, Long and Povey, Daniel},
  booktitle={International Conference on Learning Representations},
  volume={2025},
  pages={26850--26868},
  year={2025}
}

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.3

Jun 23, 2026

0.1.2

Jun 23, 2026

This version

0.1.1

Jun 23, 2026

0.1.0

Jan 16, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zipformer-0.1.1.tar.gz (137.3 kB view details)

Uploaded Jun 23, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

zipformer-0.1.1-py3-none-any.whl (150.3 kB view details)

Uploaded Jun 23, 2026 Python 3

File details

Details for the file zipformer-0.1.1.tar.gz.

File metadata

Download URL: zipformer-0.1.1.tar.gz
Upload date: Jun 23, 2026
Size: 137.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.10.20

File hashes

Hashes for zipformer-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`5b7075d1a37032e91ebaaad2611b645a1be50b6d2f28aa055263526bf840744d`
MD5	`f1ff190dea7dc2aea5117078ead3741e`
BLAKE2b-256	`65d96435be4eab2d215e219bda86e904ed8bf14ce69c03014c5da838441bafbb`

See more details on using hashes here.

File details

Details for the file zipformer-0.1.1-py3-none-any.whl.

File metadata

Download URL: zipformer-0.1.1-py3-none-any.whl
Upload date: Jun 23, 2026
Size: 150.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.10.20

File hashes

Hashes for zipformer-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`92dde4b7d40ddabcc844a8c5eac00c1090c5410be1c5a6c49e0ee9df324b6b6a`
MD5	`f645b2e3f9539b3917758e08f7669bb7`
BLAKE2b-256	`9f09811848c99ed185e17757a58c9bfa75ca86ca45509df8b70c00b85ed95b54`

See more details on using hashes here.

zipformer 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

zipformer

A faster and better encoder for automatic speech recognition

Overview

Features

Models

News

Installation

Usage

Command Line

Python API

Documentation

Discussion & Contact

Citation

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes