Skip to main content

mstt-A command-line tool and Python library for running speech-to-text against multiple models.

Project description

Modern Speech-to-Text (MSTT)

Modern Speech-to-Text (MSTT) is a Python library designed to provide a unified and extensible interface for various Speech-to-Text (STT) models. It aims to simplify the process of integrating different STT services and models into your applications, offering a consistent API regardless of the underlying STT engine.

Features

  • Unified API: Interact with multiple STT models through a single, consistent interface.
  • Extensible Design: Easily add support for new STT models and services.
  • Local and Cloud Support: Seamlessly switch between local models and cloud-based STT APIs.
  • Plugin System: Integrate custom STT models or enhance existing functionalities via a flexible plugin system.

Installation

To install MSTT, you can use pip:

pip install mstt

If you want to install with specific STT model backends, you can specify them as extras. For example, to install with funasr support:

pip install mstt[funasr]

Usage

Basic Transcription

Here's a basic example of how to use MSTT to transcribe an audio file:

from mstt import get_model

asr_model = get_model("openai/whisper-tiny")
result = asr_model.transcribe("examples/test_audio_zh.wav")

print(f"Transcription: {result.text}")

Available Models

MSTT supports various models through its plugin system. You can list available models:

mstt models

Command Line Interface (CLI)

MSTT also provides a command-line interface for quick transcriptions:

# You can use asr model on huggingface out-of-box
mstt transcribe --model openai/whisper-tiny  path/to/your/audio.wav

Run mstt --help for more CLI options.

Creating Custom Plugins

MSTT is designed to be extensible through a plugin system. You can create your own STT model plugins and register them with MSTT.

Plugin Structure

A plugin typically consists of:

  1. A Model Implementation: A Python class that inherits from mstt.models.STTModel and implements the transcribe method.
  2. A Registration Module: A Python module that registers your model with MSTT using the mstt.register_model decorator.

Example: A Simple Custom Plugin

Let's say you want to create a plugin for a hypothetical MyCustomSTT model. You would create a Python package (e.g., mstt_mycustom):

mstt_mycustom/
├── pyproject.toml
├── src/
│   └── mstt_mycustom/
│       ├── __init__.py
│       ├── models.py
│       └── register.py

src/mstt_mycustom/models.py:

from mstt.models import AsrModel
from mstt.types import TranscriptionResult, Segment

class MyCustomSTTModel(AsrModel):
    def __init__(self, model_id: str, device: str = "cpu"):
        super().__init__(model_id, device)
        # Initialize your custom model here
        print(f"Initializing MyCustomSTTModel with ID: {model_id} on device: {device}")

    def transcribe(self, audio_file_path: str) -> TranscriptionResult:
        # Implement your transcription logic here
        # This is a placeholder for demonstration
        print(f"Transcribing {audio_file_path} using MyCustomSTTModel")
        dummy_text = "This is a custom transcription result."
        dummy_segments = [
            {"start": 0.0, "end": 2.0, "text": "This is a custom"},
            {"start": 2.1, "end": 4.0, "text": "transcription result."}
        ]
        return TranscriptionResult(text=dummy_text, segments=dummy_segments)

src/mstt_mycustom/register.py:

from mstt.plugin import hookimpl
from .models import MyCustomSTTModel

@hookimpl
def register_models(registry):
    """Registers Custom models with the central registry."""

    registry.register("funasr/iic/SenseVoiceSmall", MyCustomSTTModel)
    registry.register("alias", MyCustomSTTModel)

pyproject.toml (important for plugin discovery):

[project.entry-points.mstt]
mycustom = "mstt_mycustom.register"

Installing Your Plugin

After setting up your plugin package, you can install it in editable mode for development:

pip install -e /path/to/your/mstt_mycustom

Or, if you package it, install it like any other Python package:

pip install mstt-mycustom

Once installed, MSTT will automatically discover and load your mycustom model, and you can use it like any other built-in model:

from mstt import get_model

asr_model = get_model("mycustom")
result = asr_model.transcribe("path/to/your/audio.wav")
print(result.text)

Contributing

We welcome contributions to MSTT! If you'd like to contribute, please follow these steps:

  1. Fork the repository.
  2. Create a new branch (git checkout -b feature/your-feature-name).
  3. Make your changes.
  4. Write and run tests (pytest).
  5. Commit your changes (git commit -am 'Add new feature').
  6. Push to the branch (git push origin feature/your-feature-name).
  7. Create a new Pull Request.

License

This project is licensed under the Apache License, Version 2.0 - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mstt-0.2.1.tar.gz (6.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mstt-0.2.1-py3-none-any.whl (8.8 kB view details)

Uploaded Python 3

File details

Details for the file mstt-0.2.1.tar.gz.

File metadata

  • Download URL: mstt-0.2.1.tar.gz
  • Upload date:
  • Size: 6.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.13

File hashes

Hashes for mstt-0.2.1.tar.gz
Algorithm Hash digest
SHA256 878efe67bc0d17676a3fa796d02daff646eff274ba23e652f32383e55aeab676
MD5 7f43b8a0f80fb5ac25116cba49492d51
BLAKE2b-256 3b72e528ae9e1bafa9898f61fc719536723b75290c62ff8b6280c12f429a8411

See more details on using hashes here.

File details

Details for the file mstt-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: mstt-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 8.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.13

File hashes

Hashes for mstt-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b90e806555ffaf7157673c57f8dd9f0c7e06cf1c1ba35aa19757103d796a5c24
MD5 8a0fac491f47b2fb84305c3849562e59
BLAKE2b-256 936d83c18bdef4c9fd8a58c6280a8d6d1889b3e0dfcf135fdeeb17dca376fac5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page