Skip to main content

A unified, extensible, and modern Python toolkit for LLM-based Automatic Speech Recognition (ASR).

Project description

Modern ASR

A unified, extensible, future-proof toolkit for locally running state-of-the-art LLM-based ASR models.

Python 3.10+ Apache 2.0 PyPI

简体中文 · Features · Installation · Models · Quick Start · Architecture


✨ Features

  • 🧩 19 Models — Whisper, SenseVoice, Qwen, MiMo, FireRedASR, GLM-ASR, and more.
  • 🔌 Zero-Config Plugin — Add new models via @register_model decorator.
  • 🚀 Runtime Hot-Swap — Switch models without restarting the process.
  • 🌍 Multi-Language — 52 languages, 22 Chinese dialects.
  • 🎯 Multi-Task — Transcription, translation, diarization, emotion, events.
  • 💻 Local-First — All inference on-device. No API keys. No data leaves your machine.
  • 🍎 Apple Silicon — Native MPS (Metal Performance Shaders) support.
  • 📦 Auto-Install — Dependencies, git repos, and HF weights are installed automatically on first use.
  • 🐍 Modern Python — Pydantic configs, rich CLI, ISO-timestamped logging.

📦 Installation

pip install modern-asr

Dependencies and model weights are installed automatically the first time you use a model — just type its name:

from modern_asr import ASRPipeline

pipe = ASRPipeline("sensevoice-small")
pipe = ASRPipeline("mimo-asr-v2.5")
pipe = ASRPipeline("whisper-small")

For offline/air-gapped environments, pre-install everything:

pip install modern-asr[all-models]

Available extras: transformers, vllm, onnx, firered-asr, sensevoice, fun-asr, qwen-asr, mimo-asr, glm-asr, whisper, moonshine, all-models, all-backends, all.

Requirements: Python ≥ 3.10.


🧩 Supported Models

Series Model ID Params Languages Extra
Whisper (OpenAI) whisper-tiny 39M 99+ whisper
whisper-base 74M 99+ whisper
whisper-small 244M 99+ whisper
whisper-medium 769M 99+ whisper
whisper-large-v3 1.5B 99+ whisper
whisper-large-v3-turbo 809M 99+ whisper
SenseVoice (Alibaba) sensevoice-small 234M zh/en/ja/ko/yue sensevoice

| Qwen3-ASR (Alibaba) | qwen3-asr-0.6b | 0.6B | 22 dialects | qwen-asr | | | qwen3-asr-1.7b | 1.7B | 22 dialects | qwen-asr |

| FunASR / Paraformer (Alibaba) | funasr-nano | 0.8B | zh/en | fun-asr | | | paraformer-zh | 0.2B | zh | fun-asr | | | paraformer-large | 0.7B | zh | fun-asr | | FireRedASR (Xiaohongshu) | fireredasr-aed | 1.1B | zh | firered-asr | | | fireredasr-llm | 8.3B | zh | firered-asr | | MiMo-ASR (Xiaomi) | mimo-asr-v2.5 | 8B | zh/dialects | mimo-asr | | MiDasheng (Xiaomi) | midashenglm-7b | 7B | audio understanding | mimo-asr |

| GLM-ASR (Zhipu AI) | glm-asr-nano-2512 | 1.5B | zh/en/yue | glm-asr | | Granite Speech (IBM) | granite-speech-3.3-8b | 8B | en | transformers | | Moonshine (Useful Sensors) | moonshine-tiny | 27M | en | moonshine |

# List all available models
python -m modern_asr list

🚀 Quick Start

from modern_asr import ASRPipeline

# Chinese with SenseVoice
pipe = ASRPipeline("sensevoice-small")
result = pipe("audio.wav", language="zh")
print(result.text)

# Switch to Qwen3-ASR for dialects
pipe.switch_model("qwen3-asr-0.6b")
result = pipe("audio.wav", language="zh")
print(result.text)

# English with Whisper
pipe.switch_model("whisper-small")
result = pipe("audio.wav", language="en")
print(result.text)

🏗️ Architecture

Modern ASR is built on three layers:

  1. ASRPipeline — Unified user API. Input normalization, task dispatch, model lifecycle.
  2. ASRModel / AudioLLMModel — Adapter layer. New models often need only 8 lines of config.
  3. Backends — Transformers, vLLM, ONNX Runtime.

Adding a New Model

from modern_asr.core.audio_llm import AudioLLMModel
from modern_asr.core.registry import register_model

@register_model("my-model-1b")
class MyModel1B(AudioLLMModel):
    HF_PATH = "org/MyModel-1B"
    SUPPORTED_LANGUAGES = {"zh", "en"}
    CHUNK_DURATION = 30.0

    @property
    def model_id(self) -> str:
        return "my-model-1b"

The registry auto-discovers it at runtime. That's it.


📚 Documentation

Full documentation with Material for MkDocs:

mkdocs serve

🤝 Contributing

See Contributing Guide for development setup, code style, and PR checklist.


📄 License

Apache-2.0


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

modern_asr-0.2.12.tar.gz (388.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

modern_asr-0.2.12-py3-none-any.whl (62.0 kB view details)

Uploaded Python 3

File details

Details for the file modern_asr-0.2.12.tar.gz.

File metadata

  • Download URL: modern_asr-0.2.12.tar.gz
  • Upload date:
  • Size: 388.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for modern_asr-0.2.12.tar.gz
Algorithm Hash digest
SHA256 fa55c73b5ae6acf90ffe1dd184e91e88afe2069bf4d39d97faefc1f24ac5b937
MD5 5686c60d614eaa525ed3b6d523650cc2
BLAKE2b-256 bf5406e4127e50a36de64a98e9e012242c2d49ec5a28b6ec0fa331d592960878

See more details on using hashes here.

File details

Details for the file modern_asr-0.2.12-py3-none-any.whl.

File metadata

  • Download URL: modern_asr-0.2.12-py3-none-any.whl
  • Upload date:
  • Size: 62.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for modern_asr-0.2.12-py3-none-any.whl
Algorithm Hash digest
SHA256 fa638279f283e2ab911f8cac07010121275ac87c0c14048ab03eba83b7af3292
MD5 08941fcf854ca7ac21ff38ba467c1165
BLAKE2b-256 ba032ba868995c4fb18c7126146184218ded89962e3d447bc1c9daee94ce88ec

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page