Skip to main content

A unified, extensible, and modern Python toolkit for LLM-based Automatic Speech Recognition (ASR).

Project description

Modern ASR

A unified, extensible, future-proof toolkit for locally running state-of-the-art LLM-based ASR models.

Python 3.10+ Apache 2.0 PyPI

简体中文 · Features · Installation · Models · Quick Start · Architecture


✨ Features

  • 🧩 19 Models — Whisper, SenseVoice, Qwen, MiMo, FireRedASR, GLM-ASR, and more.
  • 🔌 Zero-Config Plugin — Add new models via @register_model decorator.
  • 🚀 Runtime Hot-Swap — Switch models without restarting the process.
  • 🌍 Multi-Language — 52 languages, 22 Chinese dialects.
  • 🎯 Multi-Task — Transcription, translation, diarization, emotion, events.
  • 💻 Local-First — All inference on-device. No API keys. No data leaves your machine.
  • 🍎 Apple Silicon — Native MPS (Metal Performance Shaders) support.
  • 📦 Auto-Install — Dependencies, git repos, and HF weights are installed automatically on first use.
  • 🐍 Modern Python — Pydantic configs, rich CLI, ISO-timestamped logging.

📦 Installation

pip install modern-asr

Dependencies and model weights are installed automatically the first time you use a model — just type its name:

from modern_asr import ASRPipeline

pipe = ASRPipeline("sensevoice-small")
pipe = ASRPipeline("mimo-asr-v2.5")
pipe = ASRPipeline("whisper-small")

For offline/air-gapped environments, pre-install everything:

pip install modern-asr[all-models]

Available extras: transformers, vllm, onnx, firered-asr, sensevoice, fun-asr, qwen-asr, mimo-asr, glm-asr, whisper, moonshine, all-models, all-backends, all.

Requirements: Python ≥ 3.10.


🧩 Supported Models

Series Model ID Params Languages Extra
Whisper (OpenAI) whisper-tiny 39M 99+ whisper
whisper-base 74M 99+ whisper
whisper-small 244M 99+ whisper
whisper-medium 769M 99+ whisper
whisper-large-v3 1.5B 99+ whisper
whisper-large-v3-turbo 809M 99+ whisper
SenseVoice (Alibaba) sensevoice-small 234M zh/en/ja/ko/yue sensevoice

| Qwen3-ASR (Alibaba) | qwen3-asr-0.6b | 0.6B | 22 dialects | qwen-asr | | | qwen3-asr-1.7b | 1.7B | 22 dialects | qwen-asr |

| FunASR / Paraformer (Alibaba) | funasr-nano | 0.8B | zh/en | fun-asr | | | paraformer-zh | 0.2B | zh | fun-asr | | | paraformer-large | 0.7B | zh | fun-asr | | FireRedASR (Xiaohongshu) | fireredasr-aed | 1.1B | zh | firered-asr | | | fireredasr-llm | 8.3B | zh | firered-asr | | MiMo-ASR (Xiaomi) | mimo-asr-v2.5 | 8B | zh/dialects | mimo-asr | | MiDasheng (Xiaomi) | midashenglm-7b | 7B | audio understanding | mimo-asr |

| GLM-ASR (Zhipu AI) | glm-asr-nano-2512 | 1.5B | zh/en/yue | glm-asr | | Granite Speech (IBM) | granite-speech-3.3-8b | 8B | en | transformers | | Moonshine (Useful Sensors) | moonshine-tiny | 27M | en | moonshine |

# List all available models
python -m modern_asr list

🚀 Quick Start

from modern_asr import ASRPipeline

# Chinese with SenseVoice
pipe = ASRPipeline("sensevoice-small")
result = pipe("audio.wav", language="zh")
print(result.text)

# Switch to Qwen3-ASR for dialects
pipe.switch_model("qwen3-asr-0.6b")
result = pipe("audio.wav", language="zh")
print(result.text)

# English with Whisper
pipe.switch_model("whisper-small")
result = pipe("audio.wav", language="en")
print(result.text)

🏗️ Architecture

Modern ASR is built on three layers:

  1. ASRPipeline — Unified user API. Input normalization, task dispatch, model lifecycle.
  2. ASRModel / AudioLLMModel — Adapter layer. New models often need only 8 lines of config.
  3. Backends — Transformers, vLLM, ONNX Runtime.

Adding a New Model

from modern_asr.core.audio_llm import AudioLLMModel
from modern_asr.core.registry import register_model

@register_model("my-model-1b")
class MyModel1B(AudioLLMModel):
    HF_PATH = "org/MyModel-1B"
    SUPPORTED_LANGUAGES = {"zh", "en"}
    CHUNK_DURATION = 30.0

    @property
    def model_id(self) -> str:
        return "my-model-1b"

The registry auto-discovers it at runtime. That's it.


📚 Documentation

Full documentation with Material for MkDocs:

mkdocs serve

🤝 Contributing

See Contributing Guide for development setup, code style, and PR checklist.


📄 License

Apache-2.0


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

modern_asr-0.2.11.tar.gz (388.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

modern_asr-0.2.11-py3-none-any.whl (62.0 kB view details)

Uploaded Python 3

File details

Details for the file modern_asr-0.2.11.tar.gz.

File metadata

  • Download URL: modern_asr-0.2.11.tar.gz
  • Upload date:
  • Size: 388.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for modern_asr-0.2.11.tar.gz
Algorithm Hash digest
SHA256 eb17598cefb8004aeca858ca03ce12c62cf7fd4254d1567abf507dfb1f69d66a
MD5 f5ef30c38ce6fb4f70a49207df161b84
BLAKE2b-256 9519a14bd37ba421f87cf00526562cf8ad13cf36b237bd02b163e98cd318b2c9

See more details on using hashes here.

File details

Details for the file modern_asr-0.2.11-py3-none-any.whl.

File metadata

  • Download URL: modern_asr-0.2.11-py3-none-any.whl
  • Upload date:
  • Size: 62.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for modern_asr-0.2.11-py3-none-any.whl
Algorithm Hash digest
SHA256 11f710065832f40eac3c1cb14e83908b33d438dd2230a73385da7982b4369ad0
MD5 cedf60822661c65eecbcc8f4146d16ef
BLAKE2b-256 485a355834409762a743b254f09dcaf7bf9fa837f24ac91446731e65064efe82

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page