mstt-A command-line tool and Python library for running speech-to-text against multiple models.

Project description

Modern Speech-to-Text (MSTT)

Modern Speech-to-Text (MSTT) is a Python library designed to provide a unified and extensible interface for various Speech-to-Text (STT) models. It aims to simplify the process of integrating different STT services and models into your applications, offering a consistent API regardless of the underlying STT engine.

Features

Unified API: Interact with multiple STT models through a single, consistent interface.
Extensible Design: Easily add support for new STT models and services.
Local and Cloud Support: Seamlessly switch between local models and cloud-based STT APIs.
Plugin System: Integrate custom STT models or enhance existing functionalities via a flexible plugin system.

Installation

To install MSTT, you can use pip:

pip install mstt

If you want to install with specific STT model backends, you can specify them as extras. For example, to install with funasr support:

pip install mstt[funasr]

Usage

Basic Transcription

Here's a basic example of how to use MSTT to transcribe an audio file:

from mstt import MSTT

# Initialize MSTT with a specific model (e.g., 'funasr')
# Ensure the 'mstt-funasr' package is installed if you use 'funasr'
mstt = MSTT(model_id="funasr")

# Transcribe an audio file
audio_file_path = "path/to/your/audio.wav"
result = mstt.transcribe(audio_file_path)

print(f"Transcription: {result.text}")
print(f"Segments: {result.segments}")

Available Models

MSTT supports various models through its plugin system. You can list available models:

from mstt import MSTT

available_models = MSTT.list_models()
print("Available STT Models:")
for model_id, description in available_models.items():
    print(f"- {model_id}: {description}")

Command Line Interface (CLI)

MSTT also provides a command-line interface for quick transcriptions:

mstt transcribe --model funasr --audio path/to/your/audio.wav

Run mstt --help for more CLI options.

Creating Custom Plugins

MSTT is designed to be extensible through a plugin system. You can create your own STT model plugins and register them with MSTT.

Plugin Structure

A plugin typically consists of:

A Model Implementation: A Python class that inherits from mstt.models.STTModel and implements the transcribe method.
A Registration Module: A Python module that registers your model with MSTT using the mstt.register_model decorator.

Example: A Simple Custom Plugin

Let's say you want to create a plugin for a hypothetical MyCustomSTT model. You would create a Python package (e.g., mstt_mycustom):

mstt_mycustom/
├── pyproject.toml
├── src/
│   └── mstt_mycustom/
│       ├── __init__.py
│       ├── models.py
│       └── register.py

src/mstt_mycustom/models.py:

from mstt.models import STTModel, TranscriptionResult

class MyCustomSTTModel(STTModel):
    def __init__(self, model_id: str, device: str = "cpu"):
        super().__init__(model_id, device)
        # Initialize your custom model here
        print(f"Initializing MyCustomSTTModel with ID: {model_id} on device: {device}")

    def transcribe(self, audio_file_path: str) -> TranscriptionResult:
        # Implement your transcription logic here
        # This is a placeholder for demonstration
        print(f"Transcribing {audio_file_path} using MyCustomSTTModel")
        dummy_text = "This is a custom transcription result."
        dummy_segments = [
            {"start": 0.0, "end": 2.0, "text": "This is a custom"},
            {"start": 2.1, "end": 4.0, "text": "transcription result."}
        ]
        return TranscriptionResult(text=dummy_text, segments=dummy_segments)

src/mstt_mycustom/register.py:

from mstt import register_model
from .models import MyCustomSTTModel

@register_model("mycustom")
def register_mycustom_model():
    return MyCustomSTTModel

pyproject.toml (important for plugin discovery):

[project.entry-points.mstt]
mycustom = "mstt_mycustom.register"

Installing Your Plugin

After setting up your plugin package, you can install it in editable mode for development:

pip install -e /path/to/your/mstt_mycustom

Or, if you package it, install it like any other Python package:

pip install mstt-mycustom

Once installed, MSTT will automatically discover and load your mycustom model, and you can use it like any other built-in model:

from mstt import MSTT

mstt = MSTT(model_id="mycustom")
result = mstt.transcribe("path/to/your/audio.wav")
print(result.text)

Contributing

We welcome contributions to MSTT! If you'd like to contribute, please follow these steps:

Fork the repository.
Create a new branch (git checkout -b feature/your-feature-name).
Make your changes.
Write and run tests (pytest).
Commit your changes (git commit -am 'Add new feature').
Push to the branch (git push origin feature/your-feature-name).
Create a new Pull Request.

License

This project is licensed under the Apache License, Version 2.0 - see the LICENSE file for details.

Project details

Release history Release notifications | RSS feed

0.2.1

Jul 13, 2025

This version

0.2.0

Jul 13, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mstt-0.2.0.tar.gz (6.8 kB view details)

Uploaded Jul 13, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mstt-0.2.0-py3-none-any.whl (8.8 kB view details)

Uploaded Jul 13, 2025 Python 3

File details

Details for the file mstt-0.2.0.tar.gz.

File metadata

Download URL: mstt-0.2.0.tar.gz
Upload date: Jul 13, 2025
Size: 6.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.7.13

File hashes

Hashes for mstt-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`ebb97872dff1a0a78675a9141721c1d665a50739705ca6e5c52cabe90a0cbd46`
MD5	`4c9259721d07f2d45be546da65794e9e`
BLAKE2b-256	`a1b77851305458e1407c41a4dd37c3c3783ce7388a41a3e2a88e01a501b9107e`

See more details on using hashes here.

File details

Details for the file mstt-0.2.0-py3-none-any.whl.

File metadata

Download URL: mstt-0.2.0-py3-none-any.whl
Upload date: Jul 13, 2025
Size: 8.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.7.13

File hashes

Hashes for mstt-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`95b8dc2284e79334cfdce471144cb0e1f06f2370320c93d108e789a427b7d2ee`
MD5	`6c348044c7c2a630b2f5ecf9437372ae`
BLAKE2b-256	`2d5321eafe544bac82236d40215ca7b13a00eb85d6a3d5efb2c269a9b92e865c`

See more details on using hashes here.

mstt 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Modern Speech-to-Text (MSTT)

Features

Installation

Usage

Basic Transcription

Available Models

Command Line Interface (CLI)

Creating Custom Plugins

Plugin Structure

Example: A Simple Custom Plugin

Installing Your Plugin

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes