Skip to main content

A unified, extensible, and modern Python toolkit for LLM-based Automatic Speech Recognition (ASR).

Project description

Modern ASR

A unified, extensible, future-proof toolkit for locally running state-of-the-art LLM-based ASR models.

Python 3.10+ Apache 2.0 PyPI

简体中文 · Features · Installation · Models · Quick Start · Architecture


✨ Features

  • 🧩 19 Models — Whisper, SenseVoice, Qwen, MiMo, FireRedASR, GLM-ASR, and more.
  • 🔌 Zero-Config Plugin — Add new models via @register_model decorator.
  • 🚀 Runtime Hot-Swap — Switch models without restarting the process.
  • 🌍 Multi-Language — 52 languages, 22 Chinese dialects.
  • 🎯 Multi-Task — Transcription, translation, diarization, emotion, events.
  • 💻 Local-First — All inference on-device. No API keys. No data leaves your machine.
  • 🍎 Apple Silicon — Native MPS (Metal Performance Shaders) support.
  • 📦 Auto-Install — Dependencies, git repos, and HF weights are installed automatically on first use.
  • 🐍 Modern Python — Pydantic configs, rich CLI, ISO-timestamped logging.

📦 Installation

pip install modern-asr

Dependencies and model weights are installed automatically the first time you use a model — just type its name:

from modern_asr import ASRPipeline

pipe = ASRPipeline("sensevoice-small")
pipe = ASRPipeline("mimo-asr-v2.5")
pipe = ASRPipeline("whisper-small")

For offline/air-gapped environments, pre-install everything:

pip install modern-asr[all-models]

Available extras: transformers, vllm, onnx, firered-asr, sensevoice, fun-asr, qwen-asr, mimo-asr, glm-asr, whisper, moonshine, all-models, all-backends, all.

Requirements: Python ≥ 3.10.


🧩 Supported Models

Series Model ID Params Languages Extra
Whisper (OpenAI) whisper-tiny 39M 99+ whisper
whisper-base 74M 99+ whisper
whisper-small 244M 99+ whisper
whisper-medium 769M 99+ whisper
whisper-large-v3 1.5B 99+ whisper
whisper-large-v3-turbo 809M 99+ whisper
SenseVoice (Alibaba) sensevoice-small 234M zh/en/ja/ko/yue sensevoice

| Qwen3-ASR (Alibaba) | qwen3-asr-0.6b | 0.6B | 22 dialects | qwen-asr | | | qwen3-asr-1.7b | 1.7B | 22 dialects | qwen-asr |

| FunASR / Paraformer (Alibaba) | funasr-nano | 0.8B | zh/en | fun-asr | | | paraformer-zh | 0.2B | zh | fun-asr | | | paraformer-large | 0.7B | zh | fun-asr | | FireRedASR (Xiaohongshu) | fireredasr-aed | 1.1B | zh | firered-asr | | | fireredasr-llm | 8.3B | zh | firered-asr | | MiMo-ASR (Xiaomi) | mimo-asr-v2.5 | 8B | zh/dialects | mimo-asr | | MiDasheng (Xiaomi) | midashenglm-7b | 7B | audio understanding | mimo-asr |

| GLM-ASR (Zhipu AI) | glm-asr-nano-2512 | 1.5B | zh/en/yue | glm-asr | | Granite Speech (IBM) | granite-speech-3.3-8b | 8B | en | transformers | | Moonshine (Useful Sensors) | moonshine-tiny | 27M | en | moonshine |

# List all available models
python -m modern_asr list

🚀 Quick Start

from modern_asr import ASRPipeline

# Chinese with SenseVoice
pipe = ASRPipeline("sensevoice-small")
result = pipe("audio.wav", language="zh")
print(result.text)

# Switch to Qwen3-ASR for dialects
pipe.switch_model("qwen3-asr-0.6b")
result = pipe("audio.wav", language="zh")
print(result.text)

# English with Whisper
pipe.switch_model("whisper-small")
result = pipe("audio.wav", language="en")
print(result.text)

🏗️ Architecture

Modern ASR is built on three layers:

  1. ASRPipeline — Unified user API. Input normalization, task dispatch, model lifecycle.
  2. ASRModel / AudioLLMModel — Adapter layer. New models often need only 8 lines of config.
  3. Backends — Transformers, vLLM, ONNX Runtime.

Adding a New Model

from modern_asr.core.audio_llm import AudioLLMModel
from modern_asr.core.registry import register_model

@register_model("my-model-1b")
class MyModel1B(AudioLLMModel):
    HF_PATH = "org/MyModel-1B"
    SUPPORTED_LANGUAGES = {"zh", "en"}
    CHUNK_DURATION = 30.0

    @property
    def model_id(self) -> str:
        return "my-model-1b"

The registry auto-discovers it at runtime. That's it.


📚 Documentation

Full documentation with Material for MkDocs:

mkdocs serve

🤝 Contributing

See Contributing Guide for development setup, code style, and PR checklist.


📄 License

Apache-2.0


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

modern_asr-0.2.14.tar.gz (389.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

modern_asr-0.2.14-py3-none-any.whl (63.0 kB view details)

Uploaded Python 3

File details

Details for the file modern_asr-0.2.14.tar.gz.

File metadata

  • Download URL: modern_asr-0.2.14.tar.gz
  • Upload date:
  • Size: 389.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for modern_asr-0.2.14.tar.gz
Algorithm Hash digest
SHA256 fabc621213db97ccb46aa5c77ced52f485bc7b0b371fd70ec459e508632f763d
MD5 04e81380a7a0ebe44f169bbced59be74
BLAKE2b-256 21dd98a4e7ed01d690a86c24ee4812b3c987d7eaa399dc80f538ff7b2fd4eb39

See more details on using hashes here.

File details

Details for the file modern_asr-0.2.14-py3-none-any.whl.

File metadata

  • Download URL: modern_asr-0.2.14-py3-none-any.whl
  • Upload date:
  • Size: 63.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for modern_asr-0.2.14-py3-none-any.whl
Algorithm Hash digest
SHA256 27f3f935918ee33f83a7a75d7a160abfbdd691c1a5b7c15518a328375280eac4
MD5 b9bca43e2e3e9fe98884addc469ac8d1
BLAKE2b-256 d92c5398f85a8453aa633cf788fb21f8434b9d879ab4944af57127761c1c0baa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page