Skip to main content

A unified, extensible, and modern Python toolkit for LLM-based Automatic Speech Recognition (ASR).

Project description

Modern ASR

A unified, extensible, future-proof toolkit for locally running state-of-the-art LLM-based ASR models.

Python 3.10+ Apache 2.0 PyPI

简体中文 · Features · Installation · Models · Quick Start · Architecture


✨ Features

  • 🧩 19 Models — Whisper, SenseVoice, Qwen, MiMo, FireRedASR, GLM-ASR, and more.
  • 🔌 Zero-Config Plugin — Add new models via @register_model decorator.
  • 🚀 Runtime Hot-Swap — Switch models without restarting the process.
  • 🌍 Multi-Language — 52 languages, 22 Chinese dialects.
  • 🎯 Multi-Task — Transcription, translation, diarization, emotion, events.
  • 💻 Local-First — All inference on-device. No API keys. No data leaves your machine.
  • 🍎 Apple Silicon — Native MPS (Metal Performance Shaders) support.
  • 📦 Auto-Install — Dependencies, git repos, and HF weights are installed automatically on first use.
  • 🐍 Modern Python — Pydantic configs, rich CLI, ISO-timestamped logging.

📦 Installation

pip install modern-asr

Dependencies and model weights are installed automatically the first time you use a model — just type its name:

from modern_asr import ASRPipeline

pipe = ASRPipeline("sensevoice-small")
pipe = ASRPipeline("mimo-asr-v2.5")
pipe = ASRPipeline("whisper-small")

For offline/air-gapped environments, pre-install everything:

pip install modern-asr[all-models]

Available extras: transformers, vllm, onnx, firered-asr, sensevoice, fun-asr, qwen-asr, mimo-asr, glm-asr, whisper, moonshine, all-models, all-backends, all.

Requirements: Python ≥ 3.10.


🧩 Supported Models

Series Model ID Params Languages Extra
Whisper (OpenAI) whisper-tiny 39M 99+ whisper
whisper-base 74M 99+ whisper
whisper-small 244M 99+ whisper
whisper-medium 769M 99+ whisper
whisper-large-v3 1.5B 99+ whisper
whisper-large-v3-turbo 809M 99+ whisper
SenseVoice (Alibaba) sensevoice-small 234M zh/en/ja/ko/yue sensevoice

| Qwen3-ASR (Alibaba) | qwen3-asr-0.6b | 0.6B | 22 dialects | qwen-asr | | | qwen3-asr-1.7b | 1.7B | 22 dialects | qwen-asr |

| FunASR / Paraformer (Alibaba) | funasr-nano | 0.8B | zh/en | fun-asr | | | paraformer-zh | 0.2B | zh | fun-asr | | | paraformer-large | 0.7B | zh | fun-asr | | FireRedASR (Xiaohongshu) | fireredasr-aed | 1.1B | zh | firered-asr | | | fireredasr-llm | 8.3B | zh | firered-asr | | MiMo-ASR (Xiaomi) | mimo-asr-v2.5 | 8B | zh/dialects | mimo-asr | | MiDasheng (Xiaomi) | midashenglm-7b | 7B | audio understanding | mimo-asr |

| GLM-ASR (Zhipu AI) | glm-asr-nano-2512 | 1.5B | zh/en/yue | glm-asr | | Granite Speech (IBM) | granite-speech-3.3-8b | 8B | en | transformers | | Moonshine (Useful Sensors) | moonshine-tiny | 27M | en | moonshine |

# List all available models
python -m modern_asr list

🚀 Quick Start

from modern_asr import ASRPipeline

# Chinese with SenseVoice
pipe = ASRPipeline("sensevoice-small")
result = pipe("audio.wav", language="zh")
print(result.text)

# Switch to Qwen3-ASR for dialects
pipe.switch_model("qwen3-asr-0.6b")
result = pipe("audio.wav", language="zh")
print(result.text)

# English with Whisper
pipe.switch_model("whisper-small")
result = pipe("audio.wav", language="en")
print(result.text)

🏗️ Architecture

Modern ASR is built on three layers:

  1. ASRPipeline — Unified user API. Input normalization, task dispatch, model lifecycle.
  2. ASRModel / AudioLLMModel — Adapter layer. New models often need only 8 lines of config.
  3. Backends — Transformers, vLLM, ONNX Runtime.

Adding a New Model

from modern_asr.core.audio_llm import AudioLLMModel
from modern_asr.core.registry import register_model

@register_model("my-model-1b")
class MyModel1B(AudioLLMModel):
    HF_PATH = "org/MyModel-1B"
    SUPPORTED_LANGUAGES = {"zh", "en"}
    CHUNK_DURATION = 30.0

    @property
    def model_id(self) -> str:
        return "my-model-1b"

The registry auto-discovers it at runtime. That's it.


📚 Documentation

Full documentation with Material for MkDocs:

mkdocs serve

🤝 Contributing

See Contributing Guide for development setup, code style, and PR checklist.


📄 License

Apache-2.0


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

modern_asr-0.2.13.tar.gz (389.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

modern_asr-0.2.13-py3-none-any.whl (62.9 kB view details)

Uploaded Python 3

File details

Details for the file modern_asr-0.2.13.tar.gz.

File metadata

  • Download URL: modern_asr-0.2.13.tar.gz
  • Upload date:
  • Size: 389.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for modern_asr-0.2.13.tar.gz
Algorithm Hash digest
SHA256 86438e4672f2ad5bdfd68f89dfbeee394996ac6d91cd37518b203d6ce9da0895
MD5 0c68fbec628fb3eef397badad62be4b9
BLAKE2b-256 eb229fc5aa295cd012c620d58dbf6d79ecc69db6697e4521f1eac7d1daf4c28d

See more details on using hashes here.

File details

Details for the file modern_asr-0.2.13-py3-none-any.whl.

File metadata

  • Download URL: modern_asr-0.2.13-py3-none-any.whl
  • Upload date:
  • Size: 62.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for modern_asr-0.2.13-py3-none-any.whl
Algorithm Hash digest
SHA256 10db12d5d739d60a72615f28f728147f1403e183aaf68cadfb6ee949a29d7890
MD5 e1d5da3cbaf64d53e8f31df96305793b
BLAKE2b-256 57b9cbf08f841216b3bd09dd50a4a3f7d60b854b502a40d2256df4ae2438acf6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page