Skip to main content

A unified, extensible, and modern Python toolkit for LLM-based Automatic Speech Recognition (ASR).

Project description

Modern ASR

A unified, extensible, future-proof toolkit for locally running state-of-the-art LLM-based ASR models.

Python 3.10+ Apache 2.0 PyPI

简体中文 · Features · Installation · Models · Quick Start · Architecture


✨ Features

  • 🧩 19 Models — Whisper, SenseVoice, Qwen, MiMo, FireRedASR, GLM-ASR, and more.
  • 🔌 Zero-Config Plugin — Add new models via @register_model decorator.
  • 🚀 Runtime Hot-Swap — Switch models without restarting the process.
  • 🌍 Multi-Language — 52 languages, 22 Chinese dialects.
  • 🎯 Multi-Task — Transcription, translation, diarization, emotion, events.
  • 💻 Local-First — All inference on-device. No API keys. No data leaves your machine.
  • 🍎 Apple Silicon — Native MPS (Metal Performance Shaders) support.
  • 📦 Auto-Install — Dependencies, git repos, and HF weights are installed automatically on first use.
  • 🐍 Modern Python — Pydantic configs, rich CLI, ISO-timestamped logging.

📦 Installation

pip install modern-asr

Dependencies and model weights are installed automatically the first time you use a model — just type its name:

from modern_asr import ASRPipeline

pipe = ASRPipeline("sensevoice-small")
pipe = ASRPipeline("mimo-asr-v2.5")
pipe = ASRPipeline("whisper-small")

For offline/air-gapped environments, pre-install everything:

pip install modern-asr[all-models]

Available extras: transformers, vllm, onnx, firered-asr, sensevoice, fun-asr, qwen-asr, mimo-asr, glm-asr, whisper, moonshine, all-models, all-backends, all.

Requirements: Python ≥ 3.10.


🧩 Supported Models

Series Model ID Params Languages Extra
Whisper (OpenAI) whisper-tiny 39M 99+ whisper
whisper-base 74M 99+ whisper
whisper-small 244M 99+ whisper
whisper-medium 769M 99+ whisper
whisper-large-v3 1.5B 99+ whisper
whisper-large-v3-turbo 809M 99+ whisper
SenseVoice (Alibaba) sensevoice-small 234M zh/en/ja/ko/yue sensevoice

| Qwen3-ASR (Alibaba) | qwen3-asr-0.6b | 0.6B | 22 dialects | qwen-asr | | | qwen3-asr-1.7b | 1.7B | 22 dialects | qwen-asr |

| FunASR / Paraformer (Alibaba) | funasr-nano | 0.8B | zh/en | fun-asr | | | paraformer-zh | 0.2B | zh | fun-asr | | | paraformer-large | 0.7B | zh | fun-asr | | FireRedASR (Xiaohongshu) | fireredasr-aed | 1.1B | zh | firered-asr | | | fireredasr-llm | 8.3B | zh | firered-asr | | MiMo-ASR (Xiaomi) | mimo-asr-v2.5 | 8B | zh/dialects | mimo-asr | | MiDasheng (Xiaomi) | midashenglm-7b | 7B | audio understanding | mimo-asr |

| GLM-ASR (Zhipu AI) | glm-asr-nano-2512 | 1.5B | zh/en/yue | glm-asr | | Granite Speech (IBM) | granite-speech-3.3-8b | 8B | en | transformers | | Moonshine (Useful Sensors) | moonshine-tiny | 27M | en | moonshine |

# List all available models
python -m modern_asr list

🚀 Quick Start

from modern_asr import ASRPipeline

# Chinese with SenseVoice
pipe = ASRPipeline("sensevoice-small")
result = pipe("audio.wav", language="zh")
print(result.text)

# Switch to Qwen3-ASR for dialects
pipe.switch_model("qwen3-asr-0.6b")
result = pipe("audio.wav", language="zh")
print(result.text)

# English with Whisper
pipe.switch_model("whisper-small")
result = pipe("audio.wav", language="en")
print(result.text)

🏗️ Architecture

Modern ASR is built on three layers:

  1. ASRPipeline — Unified user API. Input normalization, task dispatch, model lifecycle.
  2. ASRModel / AudioLLMModel — Adapter layer. New models often need only 8 lines of config.
  3. Backends — Transformers, vLLM, ONNX Runtime.

Adding a New Model

from modern_asr.core.audio_llm import AudioLLMModel
from modern_asr.core.registry import register_model

@register_model("my-model-1b")
class MyModel1B(AudioLLMModel):
    HF_PATH = "org/MyModel-1B"
    SUPPORTED_LANGUAGES = {"zh", "en"}
    CHUNK_DURATION = 30.0

    @property
    def model_id(self) -> str:
        return "my-model-1b"

The registry auto-discovers it at runtime. That's it.


📚 Documentation

Full documentation with Material for MkDocs:

mkdocs serve

🤝 Contributing

See Contributing Guide for development setup, code style, and PR checklist.


📄 License

Apache-2.0


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

modern_asr-0.2.15.tar.gz (389.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

modern_asr-0.2.15-py3-none-any.whl (63.3 kB view details)

Uploaded Python 3

File details

Details for the file modern_asr-0.2.15.tar.gz.

File metadata

  • Download URL: modern_asr-0.2.15.tar.gz
  • Upload date:
  • Size: 389.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for modern_asr-0.2.15.tar.gz
Algorithm Hash digest
SHA256 565123b8f8b0e889cd2aa1c7d33ca70dacc35617cd1d17e22192616c4894252c
MD5 5155c76cf9dfde40d26ae7015eadbe71
BLAKE2b-256 2cc218f71e4122da04aff5b0fa335de93c7ea98f0659c178b75844de90ead860

See more details on using hashes here.

File details

Details for the file modern_asr-0.2.15-py3-none-any.whl.

File metadata

  • Download URL: modern_asr-0.2.15-py3-none-any.whl
  • Upload date:
  • Size: 63.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for modern_asr-0.2.15-py3-none-any.whl
Algorithm Hash digest
SHA256 10447c4c5f90abaabeac79d4e4359d56a8686a825ae0dd11eeca60ccaa80b350
MD5 b37560e7e517d2801def6948fd0116bf
BLAKE2b-256 40d72e7374a637fd97b21fe36276c6a52c35204a34f2191483893f26e465d882

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page